-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: experiment with edit prediction #50
Comments
looks like |
I think this is what I'll work on next. If you'd like to contribute or play around with this, let me know in this issue! |
I'm pretty sure Windsurf sends, on each edit (probably with some throttling), a few lines around the cursor. If you place the related code at some distance, say 20 blank lines away, the suggestion will not appear. Additionally, after observing the ghost-text suggestion for the first time, you can move the cursor away to some distance, disable the internet, and then click back where the suggestion was — the ghost text will reappear, and changes will still be possible. This suggests that it makes a request to the server and saves the last response (probably a diff?). As soon as you start typing anything, the suggestion disappears. Also do a trick with disabling the internet, add some text (which causes the suggestion to disappear), then use Ctrl+Z to revert to the state where the suggestion was visible — it will still work. This indicates that it has some sort of cache. What are the possible strategies for choosing code to send to an LLM?
About UX:
|
Thanks for the sleuthing / reverse engineering! |
I played around with smaller models (mistral-small, gemini-flash(-lite)) and I am pretty sure now that model for this task should have fast response time. From this three the best model in terms of speed/quality the best one is mistral-small (latest). Small benchmark to understand what kind of response time to expect:
Zed just now added this functionality, I think it's interesting, I will look into implementation. They use their 7B model, which is pretty small, but it performs quite well. Edit prediction announcement and blog post |
I played around a little more, and what I found most challenging is forcing the LLM to give the expected answer. Sometimes, it doesn't return the whole block of code as I expect but only a small part of the changed code. I think this can be fixed to some extent by tweaking the prompt. With the model that the Zed editor uses (Zeta, which is open-weight), you don't have such problems because it's fine-tuned and reliably returns code specified inside meta tags. So, we could map such diffs to virtual text to show suggestions. cut.mp4 |
huh, interesting. So this is just going off of the current state of the buffer. I was actually envisioning using the history of edits as part of the context, so the prompt would look something like:
I've been focusing on tracking edits done to buffers with nvim_buf_attach to allow one to provide the before/after lines for the most recent edit. But your experiments seem useful even with a really small context window. One concern that I have is in the tradeoff beetween the size of the context and speed of prediction, vs the quality of the prediction... I definitely think that based on the effort Zed has put into making Zeta fast it will probably be worthwhile to try and make this as fast as possible, which probably means using a local model and a really small section of the buffer. I'll probably start with:
And we can use that as a foundation. We can then add alternate providers for just inline completion, and experimenting with local models, automated triggering and so on. |
Actually the very first tests I did was with examples from zeta dataset and I am certain that history of edits helps achieve more accurate results, so it's very important to implement them.
Small context window is sufficient to edit local area, I am sure that with small history of edits we could achieve good results. Speaking of speed, I managed to achieve sub-seconds result using Codestral model (mistral), they support output prediction which increases speed a lot, you can get between 0.5s - 0.8s in my testes. In the podcast Zed developer said that they use speculative decoding, because LLM respond basically with the same code that you put in it. I don't know how size of the input (context) impact generation time but size of the output is very strong factor. Sometimes LLM bugs and instead of whole code block returns just lines that was changed and you get like 0.3s response time. Supporting local models would be great, obviously you have to have good enough machine. I could test only with my m1 air and its slow) |
Did you check the docs?
Is your feature request related to a problem? Please describe.
One of the really nice features of windsurf and cody is that they do edit prediction, rather than just completion.
I think this works by recording the diff of the current file, as well as some sequence of edits. Then the LLM is prompted to predict the next edit, not just the completion to the current line.
This can either be constantly running in the background (which is expensive, takes up a lot of tokens), or can be manually triggered.
Describe the solution you'd like
Magenta.nvim should record the edits you make to a buffer. Either in an automated way, or when triggered, it should send that history of edits to the LLM and force a tool use to predict the next edit. This edit should be a replace or insert operation.
Magenta should show this edit to you via ghost text in the buffer you're editing, and take a keybinding to accept the edit. Typing some text that's incompatible with the edit should reject the edit.
I think a really basic version of this would just observe the current diff of the file (which can use version control possible, or just snapshot the buffer periodically). Then send the diff along with the cursor position to the LLM, and force it to use replace tool. Then display the replace operation in the buffer via ghost text (though not clear how we communicate "removing" text ... maybe highlighting or a ghost-strike-through effect? Is a ghost strike-through even possible?)
Describe alternatives you've considered
I'm not confident this will be fast enough to be useful. Are snapshots sufficient or do we need more granularity, like "user added this text to this line", "user deleted this text from this line" ... more of a narrative structure? Perhaps the full diff and the last edit as a smaller diff would work?
Should we also include the conversation from the chat buffer in the context? I fear that it might be too slow... though with caching it could still be fast enough? I think this is probably the best version of this and so might actually be pretty nice.
Also, not sure if this will be fast enough to really be useful. I think the initial version would use maximal context and be manually triggered. Though perhaps in the future I could experiment with faster versions (using less context, different models) that are triggered automatically behind the scenes.
Additional context
No response
The text was updated successfully, but these errors were encountered: