Replies: 20 comments 2 replies
-
I noticed it happening myself it kinda occurs like this |
Beta Was this translation helpful? Give feedback.
-
I noticed this too, but for me after pressing enter it prints out the introduction of some random book. |
Beta Was this translation helpful? Give feedback.
-
+1 here. Sometimes I wait it out and it restarts, but if I see the CPU fall to zero, then it won't buldge until I press enter. |
Beta Was this translation helpful? Give feedback.
-
same here |
Beta Was this translation helpful? Give feedback.
-
is there solution for this ? I tried to use --mlock and -c 2048, but still stopped |
Beta Was this translation helpful? Give feedback.
-
There are two possible fixes: Some models will use the eos to return control to you when they are done answering. So depending which model you're using, one might work better than the other. If in doubt, try |
Beta Was this translation helpful? Give feedback.
-
@DannyDaemonic , I just tried that adding -c 2048 with vicuna 13b havent gotten stopped yet, I'll try your suggestion if I found another stop. Thank you very much |
Beta Was this translation helpful? Give feedback.
-
When the context fills up, it has to cut your old history in half and reprocess it all (to make room to remember new stuff). This can also be a slow process, so it's possible the pause you were seeing was simply the system "thinking" and it wasn't actually waiting for you to hit enter. Or it could be a combination.
|
Beta Was this translation helpful? Give feedback.
-
@DannyDaemonic maybe I was wrong, I meant -n 2048 ? Not sure sorry, I was only this morning playing with this 😄 |
Beta Was this translation helpful? Give feedback.
-
Ok, thanks all for your help, the key is like @DannyDaemonic said : -n -1 |
Beta Was this translation helpful? Give feedback.
-
The long pause you get after a long conversation is usually due to the context memory being trimmed and re-evaluated so it has space to continue the conversation. Context length isn't usually limited by your computer's RAM, but rather by how much context the model has been trained to handle. |
Beta Was this translation helpful? Give feedback.
-
@DannyDaemonic is it possible I experienced memory leak then? It happened when lot of back and forth chatting going on, then the memory swapping happening |
Beta Was this translation helpful? Give feedback.
-
I'm guessing the "swapping" you are experiencing is the context swapping. It can be a very slow process. It happens once the context fills up, so this will happen after a long conversation. |
Beta Was this translation helpful? Give feedback.
-
@DannyDaemonic, I see. Today I try to replicate again, without using --mlock, and it wont swap, and the speed different is almost un-noticeable, I think for now I better not use --mlock and with -n -1 |
Beta Was this translation helpful? Give feedback.
-
This is still happening. I'm using the server, and in the browser console it tells you the reason for stopping, and when I'm getting incomplete sentences or incomplete code I asked for, the debug console says the reason for stopping was eos. When this happens, almost certainly the next couple of sentences get progressively more meaningless, including repeating my own questions, attempts at talking to itself, or endless stream of a single character, usually \n or #, and if I continue, it starts to produce more and more garbage, completely ignoring my input, and replying to absolutely random pieces of previous conversation. It's almost as if when the context gets full and it does it's context "compression" thing, it butchers the question/reply scheme and the model sees messed up things in the context. |
Beta Was this translation helpful? Give feedback.
-
Maybe some breaking changes might have been introduced recently? I clearly remember about a month or two ago I was able to have long conversations with large WizardLM models (in interactive/chat mode), but this morning, after long break, I downloaded and compiled latest llama.cpp, and re-quantized my model, and I can only get 1-2 responses from it before it freeze up and then it would start generating random gibberish or talking to itself, after I hit enter. I tried prompting with and without "-n" parameter, tried different models, but I'm yet to find combination which would work. |
Beta Was this translation helpful? Give feedback.
-
I just ran couple of tests on a "single run" mode from the command line, giving it a prompt of a conversation to continue, with -n -1 and the ban eos flag (can't remember it exactly, I'm on my phone right now), just to see what happens. It continued to generate for a fairly long time, way past the context limit, but at some point it REALLY tried to stop, by saying things like "the end", and generating some more and then saying "for real the end", and then doing some more to the point where it just started to sound like an a SMS "ok, bye I'm going to sklep now, BYE, BYYYYEEEE" and after that gave me a copious amount of emoji followed by continuous stream of non printable characters, aka it got completely off the rails. (But it was genuinely hilarious, like it wanted to let me know it had absolutely enough, like in the "blink twice if you need help" kind of way) Any prompt, and several different model, all get to the same point eventually, during repeated tests. Sometimes sooner, sometimes it got quite far. Edit: Wait, now that I think about it, did an AI just imply that I'm a creep ? Did I just get friendzined by na AI ? |
Beta Was this translation helpful? Give feedback.
-
With a bit more testing I think I found one of the recent commit introduced some changes, which are likely the cause of my issues #2304 |
Beta Was this translation helpful? Give feedback.
-
Using the following parameters with build 916a9ac: -c 512 -b 1024 -n 1024 --keep 1024 --repeat_penalty 1.1 --color -i -p "my question..." https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md |
Beta Was this translation helpful? Give feedback.
-
@mattiashallberg having same issues. how to make it "continue" response without stopping? |
Beta Was this translation helpful? Give feedback.
-
I noticed that in the chat mode the inference often stops mid-sentence and requires the user to press enter to continue.
However this introduces a new line at the end of the context that makes LLaMA terminate what it was saying and give back the control to the user with the reverse prompt
User:
Does someone know why this happens and if it would be beneficial to make the
Enter
keypress just continue inference instead of also adding a\n
when both-i
and--reverse-prompt
are passed? Obviously the\n
append is skipped only if the termination was not due to a reverse prompt match.Note: this happens also after very few tokens (5-10) sometimes.
Beta Was this translation helpful? Give feedback.
All reactions