-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reverse prompt is sometimes ignored. #292
Comments
I think this might have to do with the way the program deals with a large Related to this, there is the issue of interactive mode dropping out back to the shell after the number of requested tokens has been consumed. This makes people use large I think we could handle running out of tokens when in interactive mode differently, similar to what alpaca.cpp does. Instead of closing the program, resetting the counter and going into interacting mode. I will try to explore both issues later today and see if I can make a pull request. |
For this, we could add a flag that forbid the sampler to draw control symbols like begin/end of stream. |
Masking past context might be an issue, but I think it's useful to sample end of stream symbols if the model decides a particular sequence should end early. The main issue here (which is very easy to solve), is that ending the stream (or running out of token budget) shouldn't mean we should close the program if we are in interactive mode, but rather let the user input more text. I think I have figured out where the bug with the sampler not stopping when it should is. It seems the program keeps sampling new tokens regardless of the interactive state. If I'm not mistaken, the conditions in main.cpp:937 and main.cpp:995 can be true at the same time, but while I'll see if I can fix it, test it and make a pull request. |
This comment was marked as outdated.
This comment was marked as outdated.
I mentioned it in another issue: llama was trained with tokenizer augmentations, means the tokenizer occasionally did sub-optimal word partitioning at training time: https://github.com/google/sentencepiece#subword-regularization-and-bpe-dropout It is claimed it improves generalization. As a side effect trained model now considers forming words with sub-optimal tokens. Reverse prompt is detected by matching lists of tokens. So, if you search for |
I can confirm this has happened to me several times. |
* Check for reverse prompt by characters instead of tokens (#292) * Update main.cpp Wording. * Cleanup. * Remove unnecessary use of std::stringstream. --------- Co-authored-by: Johnman <tjohnman@github> Co-authored-by: Georgi Gerganov <[email protected]>
… (ggml-org#330) * Check for reverse prompt by characters instead of tokens (ggml-org#292) * Update main.cpp Wording. * Cleanup. * Remove unnecessary use of std::stringstream. --------- Co-authored-by: Johnman <tjohnman@github> Co-authored-by: Georgi Gerganov <[email protected]>
… (ggml-org#330) * Check for reverse prompt by characters instead of tokens (ggml-org#292) * Update main.cpp Wording. * Cleanup. * Remove unnecessary use of std::stringstream. --------- Co-authored-by: Johnman <tjohnman@github> Co-authored-by: Georgi Gerganov <[email protected]>
… (ggml-org#330) * Check for reverse prompt by characters instead of tokens (ggml-org#292) * Update main.cpp Wording. * Cleanup. * Remove unnecessary use of std::stringstream. --------- Co-authored-by: Johnman <tjohnman@github> Co-authored-by: Georgi Gerganov <[email protected]>
… (ggml-org#330) * Check for reverse prompt by characters instead of tokens (ggml-org#292) * Update main.cpp Wording. * Cleanup. * Remove unnecessary use of std::stringstream. --------- Co-authored-by: Johnman <tjohnman@github> Co-authored-by: Georgi Gerganov <[email protected]>
…ngs-0.22.0 Bump mkdocstrings from 0.21.2 to 0.22.0
… (ggml-org#330) * Check for reverse prompt by characters instead of tokens (ggml-org#292) * Update main.cpp Wording. * Cleanup. * Remove unnecessary use of std::stringstream. --------- Co-authored-by: Johnman <tjohnman@github> Co-authored-by: Georgi Gerganov <[email protected]>
I haven't found a consistent pattern to reproduce this, but sometimes the model will continue outputting text even after it has printed the reverse prompt. If colors are enabled, they will change as if the new text was user input, but it is generated by the model. After this happen it might or might not revert to its proper behavior once it finds the reverse prompt again.
I have noticed the color change doesn't always happen right on the prompt, but sometimes it happens a few words before it. I don't know enough about how this code works yet to speculate, but in case this has something to do with parallelism, I'm using
-t 16
.The text was updated successfully, but these errors were encountered: