Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: experiment with edit prediction #50

Open
1 task done
dlants opened this issue Feb 4, 2025 · 9 comments
Open
1 task done

feature: experiment with edit prediction #50

dlants opened this issue Feb 4, 2025 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@dlants
Copy link
Owner

dlants commented Feb 4, 2025

Did you check the docs?

  • I have read all the docs

Is your feature request related to a problem? Please describe.

One of the really nice features of windsurf and cody is that they do edit prediction, rather than just completion.

I think this works by recording the diff of the current file, as well as some sequence of edits. Then the LLM is prompted to predict the next edit, not just the completion to the current line.

This can either be constantly running in the background (which is expensive, takes up a lot of tokens), or can be manually triggered.

Describe the solution you'd like

Magenta.nvim should record the edits you make to a buffer. Either in an automated way, or when triggered, it should send that history of edits to the LLM and force a tool use to predict the next edit. This edit should be a replace or insert operation.

Magenta should show this edit to you via ghost text in the buffer you're editing, and take a keybinding to accept the edit. Typing some text that's incompatible with the edit should reject the edit.

I think a really basic version of this would just observe the current diff of the file (which can use version control possible, or just snapshot the buffer periodically). Then send the diff along with the cursor position to the LLM, and force it to use replace tool. Then display the replace operation in the buffer via ghost text (though not clear how we communicate "removing" text ... maybe highlighting or a ghost-strike-through effect? Is a ghost strike-through even possible?)

Describe alternatives you've considered

I'm not confident this will be fast enough to be useful. Are snapshots sufficient or do we need more granularity, like "user added this text to this line", "user deleted this text from this line" ... more of a narrative structure? Perhaps the full diff and the last edit as a smaller diff would work?

Should we also include the conversation from the chat buffer in the context? I fear that it might be too slow... though with caching it could still be fast enough? I think this is probably the best version of this and so might actually be pretty nice.

Also, not sure if this will be fast enough to really be useful. I think the initial version would use maximal context and be manually triggered. Though perhaps in the future I could experiment with faster versions (using less context, different models) that are triggered automatically behind the scenes.

Additional context

No response

@dlants dlants added the enhancement New feature or request label Feb 4, 2025
@dlants
Copy link
Owner Author

dlants commented Feb 4, 2025

looks like nvim_buf_attach combined with some snapshots may be the ticket here.

@dlants
Copy link
Owner Author

dlants commented Feb 4, 2025

I think this is what I'll work on next. If you'd like to contribute or play around with this, let me know in this issue!

@dlants dlants self-assigned this Feb 4, 2025
@angrypie
Copy link

I'm pretty sure Windsurf sends, on each edit (probably with some throttling), a few lines around the cursor. If you place the related code at some distance, say 20 blank lines away, the suggestion will not appear.

Additionally, after observing the ghost-text suggestion for the first time, you can move the cursor away to some distance, disable the internet, and then click back where the suggestion was — the ghost text will reappear, and changes will still be possible. This suggests that it makes a request to the server and saves the last response (probably a diff?). As soon as you start typing anything, the suggestion disappears.

Also do a trick with disabling the internet, add some text (which causes the suggestion to disappear), then use Ctrl+Z to revert to the state where the suggestion was visible — it will still work. This indicates that it has some sort of cache.

What are the possible strategies for choosing code to send to an LLM?

  1. Simply send N lines around the cursor and trigger suggestions only within those N lines.
  2. Use the AST to determine the current context and extract as much useful information as possible. This ensures that relevant information about the scope is included. In first approach we could just stop one line short of the method signature. We can cache suggestions per AST node.

About UX:

  1. I assume Windsurf isn't using Claude for this task; it's probably their Base model. It would be a good idea to add the ability to choose a cheaper "weaker" model specifically for this task.
  2. Setting up both manual and automatic triggering would be nice.

@dlants
Copy link
Owner Author

dlants commented Feb 10, 2025

Thanks for the sleuthing / reverse engineering!

@angrypie
Copy link

angrypie commented Feb 14, 2025

I played around with smaller models (mistral-small, gemini-flash(-lite)) and I am pretty sure now that model for this task should have fast response time. From this three the best model in terms of speed/quality the best one is mistral-small (latest).

Small benchmark to understand what kind of response time to expect:

average
  mistralSmall: 799,
  flash2: 1335,
  flashLite: 600,

response times
  mistralSmall: [ 833, 759, 931, 734, 753, 894, 751, 786, 831, 718 ],
  flash2: [ 1494, 1133, 1197, 1553, 1157, 1610, 1457, 1208, 1347, 1197 ],
  flashLite: [ 587, 714, 704, 554, 571, 534, 535, 586, 604, 609 ],

Zed just now added this functionality, I think it's interesting, I will look into implementation. They use their 7B model, which is pretty small, but it performs quite well. Edit prediction announcement and blog post

@angrypie
Copy link

I did some testing, and it seems possible to make something out of a request to a small model and a diff.
Of course, delay should be taken into account. A few lines of prompt gave me a sub-second response time, while this much prompt resulted in an average of 1.6 seconds.

Image

@angrypie
Copy link

I played around a little more, and what I found most challenging is forcing the LLM to give the expected answer. Sometimes, it doesn't return the whole block of code as I expect but only a small part of the changed code. I think this can be fixed to some extent by tweaking the prompt.

With the model that the Zed editor uses (Zeta, which is open-weight), you don't have such problems because it's fine-tuned and reliably returns code specified inside meta tags.

So, we could map such diffs to virtual text to show suggestions.
Branch with experiment: https://github.com/angrypie/magenta.nvim/tree/edit-predict-exp

cut.mp4

@dlants
Copy link
Owner Author

dlants commented Feb 19, 2025

huh, interesting. So this is just going off of the current state of the buffer. I was actually envisioning using the history of edits as part of the context, so the prompt would look something like:

the user is editing the buffer <buffername>

The last edit the user made was changing the following lines:
Lines before edit:
<lines before edit>
Lines after edit:
<lines after edit>

The current contents of the buffer are:
<buffer content with cursor marker>

I've been focusing on tracking edits done to buffers with nvim_buf_attach to allow one to provide the before/after lines for the most recent edit.

But your experiments seem useful even with a really small context window. One concern that I have is in the tradeoff beetween the size of the context and speed of prediction, vs the quality of the prediction... I definitely think that based on the effort Zed has put into making Zeta fast it will probably be worthwhile to try and make this as fast as possible, which probably means using a local model and a really small section of the buffer.

I'll probably start with:

  • using the recent edit diff & the full visible buffer as context
  • using the existing providers via forced tool use (as is done with inline edit mode)
  • using a manual trigger
  • using ghost-text to preview the edit

And we can use that as a foundation. We can then add alternate providers for just inline completion, and experimenting with local models, automated triggering and so on.

@angrypie
Copy link

angrypie commented Feb 19, 2025

huh, interesting. So this is just going off of the current state of the buffer. I was actually envisioning using the history of edits as part of the context

Actually the very first tests I did was with examples from zeta dataset and I am certain that history of edits helps achieve more accurate results, so it's very important to implement them.
Here the file I use to experiment.

But your experiments seem useful even with a really small context window. One concern that I have is in the tradeoff beetween the size of the context and speed of prediction, vs the quality of the prediction... I definitely think that based on the effort Zed has put into making Zeta fast it will probably be worthwhile to try and make this as fast as possible, which probably means using a local model and a really small section of the buffer.

Small context window is sufficient to edit local area, I am sure that with small history of edits we could achieve good results. Speaking of speed, I managed to achieve sub-seconds result using Codestral model (mistral), they support output prediction which increases speed a lot, you can get between 0.5s - 0.8s in my testes. In the podcast Zed developer said that they use speculative decoding, because LLM respond basically with the same code that you put in it.

I don't know how size of the input (context) impact generation time but size of the output is very strong factor. Sometimes LLM bugs and instead of whole code block returns just lines that was changed and you get like 0.3s response time.

Supporting local models would be great, obviously you have to have good enough machine. I could test only with my m1 air and its slow)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants