Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverse prompt is sometimes ignored. #292

Closed
tjohnman opened this issue Mar 19, 2023 · 6 comments
Closed

Reverse prompt is sometimes ignored. #292

tjohnman opened this issue Mar 19, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@tjohnman
Copy link
Contributor

tjohnman commented Mar 19, 2023

I haven't found a consistent pattern to reproduce this, but sometimes the model will continue outputting text even after it has printed the reverse prompt. If colors are enabled, they will change as if the new text was user input, but it is generated by the model. After this happen it might or might not revert to its proper behavior once it finds the reverse prompt again.

I have noticed the color change doesn't always happen right on the prompt, but sometimes it happens a few words before it. I don't know enough about how this code works yet to speculate, but in case this has something to do with parallelism, I'm using -t 16.

@gjmulder gjmulder added the bug Something isn't working label Mar 19, 2023
@tjohnman
Copy link
Contributor Author

tjohnman commented Mar 19, 2023

I think this might have to do with the way the program deals with a large -n token count.

Related to this, there is the issue of interactive mode dropping out back to the shell after the number of requested tokens has been consumed. This makes people use large -n numbers and put things in their prompts like “the conversation goes on forever” in order to try and avoid it.

I think we could handle running out of tokens when in interactive mode differently, similar to what alpaca.cpp does. Instead of closing the program, resetting the counter and going into interacting mode.

I will try to explore both issues later today and see if I can make a pull request.

@Piezoid
Copy link
Contributor

Piezoid commented Mar 19, 2023

This makes people use large -n numbers and put things in their prompts like “the conversation goes on forever” in order to try and avoid it.

For this, we could add a flag that forbid the sampler to draw control symbols like begin/end of stream.
I tried a hardcoded version of that and it works well. I found that banning the bos (begin of stream) token is also needed, because the model sometimes attempt to start a new document, effectively masking most of the past context.

@tjohnman
Copy link
Contributor Author

tjohnman commented Mar 19, 2023

This makes people use large -n numbers and put things in their prompts like “the conversation goes on forever” in order to try and avoid it.

For this, we could add a flag that forbid the sampler to draw control symbols like begin/end of stream. I tried a hardcoded version of that and it works well. I found that banning the bos (begin of stream) token is also needed, because the model sometimes attempt to start a new document, effectively masking most of the past context.

Masking past context might be an issue, but I think it's useful to sample end of stream symbols if the model decides a particular sequence should end early. The main issue here (which is very easy to solve), is that ending the stream (or running out of token budget) shouldn't mean we should close the program if we are in interactive mode, but rather let the user input more text.

I think I have figured out where the bug with the sampler not stopping when it should is. It seems the program keeps sampling new tokens regardless of the interactive state.

If I'm not mistaken, the conditions in main.cpp:937 and main.cpp:995 can be true at the same time, but while is_interacting == true, no sampling should be done, otherwise we are both sampling more tokens from the model and asking the user for input at the same time. These should be exclusive. Making sampling happen only when is_interacting is false should fix this bug.

I'll see if I can fix it, test it and make a pull request.

@LoganDark

This comment was marked as outdated.

@jarcen
Copy link

jarcen commented Mar 20, 2023

I mentioned it in another issue: llama was trained with tokenizer augmentations, means the tokenizer occasionally did sub-optimal word partitioning at training time:

https://github.com/google/sentencepiece#subword-regularization-and-bpe-dropout

It is claimed it improves generalization. As a side effect trained model now considers forming words with sub-optimal tokens. Reverse prompt is detected by matching lists of tokens. So, if you search for ['▁Your' , '▁Input'] and model outputs ['▁Your', '▁I', 'n', 'p', 'u', 't'] then it will look exactly same but there will be no match and user will not receive control. Correct implementation must track text outputted by the model and search in that range for reverse prompt by characters rather than by tokens.

@tjohnman
Copy link
Contributor Author

I mentioned it in another issue: llama was trained with tokenizer augmentations, means the tokenizer occasionally did sub-optimal word partitioning at training time:

https://github.com/google/sentencepiece#subword-regularization-and-bpe-dropout

It is claimed it improves generalization. As a side effect trained model now considers forming words with sub-optimal tokens. Reverse prompt is detected by matching lists of tokens. So, if you search for ['▁Your' , '▁Input'] and model outputs ['▁Your', '▁I', 'n', 'p', 'u', 't'] then it will look exactly same but there will be no match and user will not receive control. Correct implementation must track text outputted by the model and search in that range for reverse prompt by characters rather than by tokens.

I can confirm this has happened to me several times.

tjohnman pushed a commit to tjohnman/llama.cpp that referenced this issue Mar 20, 2023
tomsnunes added a commit to tomsnunes/alpama.cpp that referenced this issue Mar 20, 2023
ggerganov added a commit that referenced this issue Mar 21, 2023
* Check for reverse prompt by characters instead of tokens (#292)

* Update main.cpp

Wording.

* Cleanup.

* Remove unnecessary use of std::stringstream.

---------

Co-authored-by: Johnman <tjohnman@github>
Co-authored-by: Georgi Gerganov <[email protected]>
glinscott pushed a commit to glinscott/llama.cpp that referenced this issue Mar 21, 2023
… (ggml-org#330)

* Check for reverse prompt by characters instead of tokens (ggml-org#292)

* Update main.cpp

Wording.

* Cleanup.

* Remove unnecessary use of std::stringstream.

---------

Co-authored-by: Johnman <tjohnman@github>
Co-authored-by: Georgi Gerganov <[email protected]>
aroidzap pushed a commit to aroidzap/llama.cpp that referenced this issue Apr 8, 2023
… (ggml-org#330)

* Check for reverse prompt by characters instead of tokens (ggml-org#292)

* Update main.cpp

Wording.

* Cleanup.

* Remove unnecessary use of std::stringstream.

---------

Co-authored-by: Johnman <tjohnman@github>
Co-authored-by: Georgi Gerganov <[email protected]>
aroidzap pushed a commit to aroidzap/llama.cpp that referenced this issue Apr 8, 2023
… (ggml-org#330)

* Check for reverse prompt by characters instead of tokens (ggml-org#292)

* Update main.cpp

Wording.

* Cleanup.

* Remove unnecessary use of std::stringstream.

---------

Co-authored-by: Johnman <tjohnman@github>
Co-authored-by: Georgi Gerganov <[email protected]>
aroidzap pushed a commit to aroidzap/llama.cpp that referenced this issue Apr 8, 2023
… (ggml-org#330)

* Check for reverse prompt by characters instead of tokens (ggml-org#292)

* Update main.cpp

Wording.

* Cleanup.

* Remove unnecessary use of std::stringstream.

---------

Co-authored-by: Johnman <tjohnman@github>
Co-authored-by: Georgi Gerganov <[email protected]>
Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023
…ngs-0.22.0

Bump mkdocstrings from 0.21.2 to 0.22.0
poulphunter pushed a commit to poulphunter/llama.cpp that referenced this issue Feb 23, 2025
… (ggml-org#330)

* Check for reverse prompt by characters instead of tokens (ggml-org#292)

* Update main.cpp

Wording.

* Cleanup.

* Remove unnecessary use of std::stringstream.

---------

Co-authored-by: Johnman <tjohnman@github>
Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants