Reverse prompt is sometimes ignored. #292

tjohnman · 2023-03-19T12:15:09Z

I haven't found a consistent pattern to reproduce this, but sometimes the model will continue outputting text even after it has printed the reverse prompt. If colors are enabled, they will change as if the new text was user input, but it is generated by the model. After this happen it might or might not revert to its proper behavior once it finds the reverse prompt again.

I have noticed the color change doesn't always happen right on the prompt, but sometimes it happens a few words before it. I don't know enough about how this code works yet to speculate, but in case this has something to do with parallelism, I'm using -t 16.

The text was updated successfully, but these errors were encountered:

tjohnman · 2023-03-19T12:54:54Z

I think this might have to do with the way the program deals with a large -n token count.

Related to this, there is the issue of interactive mode dropping out back to the shell after the number of requested tokens has been consumed. This makes people use large -n numbers and put things in their prompts like “the conversation goes on forever” in order to try and avoid it.

I think we could handle running out of tokens when in interactive mode differently, similar to what alpaca.cpp does. Instead of closing the program, resetting the counter and going into interacting mode.

I will try to explore both issues later today and see if I can make a pull request.

Piezoid · 2023-03-19T14:42:46Z

This makes people use large -n numbers and put things in their prompts like “the conversation goes on forever” in order to try and avoid it.

For this, we could add a flag that forbid the sampler to draw control symbols like begin/end of stream.
I tried a hardcoded version of that and it works well. I found that banning the bos (begin of stream) token is also needed, because the model sometimes attempt to start a new document, effectively masking most of the past context.

tjohnman · 2023-03-19T15:13:16Z

This makes people use large -n numbers and put things in their prompts like “the conversation goes on forever” in order to try and avoid it.

For this, we could add a flag that forbid the sampler to draw control symbols like begin/end of stream. I tried a hardcoded version of that and it works well. I found that banning the bos (begin of stream) token is also needed, because the model sometimes attempt to start a new document, effectively masking most of the past context.

Masking past context might be an issue, but I think it's useful to sample end of stream symbols if the model decides a particular sequence should end early. The main issue here (which is very easy to solve), is that ending the stream (or running out of token budget) shouldn't mean we should close the program if we are in interactive mode, but rather let the user input more text.

I think I have figured out where the bug with the sampler not stopping when it should is. It seems the program keeps sampling new tokens regardless of the interactive state.

If I'm not mistaken, the conditions in main.cpp:937 and main.cpp:995 can be true at the same time, but while is_interacting == true, no sampling should be done, otherwise we are both sampling more tokens from the model and asking the user for input at the same time. These should be exclusive. Making sampling happen only when is_interacting is false should fix this bug.

I'll see if I can fix it, test it and make a pull request.

jarcen · 2023-03-20T14:17:14Z

I mentioned it in another issue: llama was trained with tokenizer augmentations, means the tokenizer occasionally did sub-optimal word partitioning at training time:

https://github.com/google/sentencepiece#subword-regularization-and-bpe-dropout

It is claimed it improves generalization. As a side effect trained model now considers forming words with sub-optimal tokens. Reverse prompt is detected by matching lists of tokens. So, if you search for ['▁Your' , '▁Input'] and model outputs ['▁Your', '▁I', 'n', 'p', 'u', 't'] then it will look exactly same but there will be no match and user will not receive control. Correct implementation must track text outputted by the model and search in that range for reverse prompt by characters rather than by tokens.

tjohnman · 2023-03-20T14:27:56Z

I mentioned it in another issue: llama was trained with tokenizer augmentations, means the tokenizer occasionally did sub-optimal word partitioning at training time:

https://github.com/google/sentencepiece#subword-regularization-and-bpe-dropout

It is claimed it improves generalization. As a side effect trained model now considers forming words with sub-optimal tokens. Reverse prompt is detected by matching lists of tokens. So, if you search for ['▁Your' , '▁Input'] and model outputs ['▁Your', '▁I', 'n', 'p', 'u', 't'] then it will look exactly same but there will be no match and user will not receive control. Correct implementation must track text outputted by the model and search in that range for reverse prompt by characters rather than by tokens.

I can confirm this has happened to me several times.

* Check for reverse prompt by characters instead of tokens (#292) * Update main.cpp Wording. * Cleanup. * Remove unnecessary use of std::stringstream. --------- Co-authored-by: Johnman <tjohnman@github> Co-authored-by: Georgi Gerganov <[email protected]>

… (ggml-org#330) * Check for reverse prompt by characters instead of tokens (ggml-org#292) * Update main.cpp Wording. * Cleanup. * Remove unnecessary use of std::stringstream. --------- Co-authored-by: Johnman <tjohnman@github> Co-authored-by: Georgi Gerganov <[email protected]>

…ngs-0.22.0 Bump mkdocstrings from 0.21.2 to 0.22.0

… (ggml-org#330) * Check for reverse prompt by characters instead of tokens (ggml-org#292) * Update main.cpp Wording. * Cleanup. * Remove unnecessary use of std::stringstream. --------- Co-authored-by: Johnman <tjohnman@github> Co-authored-by: Georgi Gerganov <[email protected]>

gjmulder added the bug Something isn't working label Mar 19, 2023

This comment was marked as outdated.

Sign in to view

tjohnman mentioned this issue Mar 20, 2023

Check for reverse prompt by characters instead of tokens (#292) #330

Merged

tjohnman pushed a commit to tjohnman/llama.cpp that referenced this issue Mar 20, 2023

Check for reverse prompt by characters instead of tokens (ggml-org#292)

b646ffa

tomsnunes added a commit to tomsnunes/alpama.cpp that referenced this issue Mar 20, 2023

Check for reverse prompt by characters instead of tokens (ggml-org#292)

2562c1f

tjohnman closed this as completed Mar 21, 2023

joshmackwilliams mentioned this issue Apr 5, 2023

feat: add "stop" keywords as alternative to eot token #769

Closed

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023

Merge pull request ggml-org#292 from abetlen/dependabot/pip/mkdocstri…

41e2184

…ngs-0.22.0 Bump mkdocstrings from 0.21.2 to 0.22.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reverse prompt is sometimes ignored. #292

Reverse prompt is sometimes ignored. #292

tjohnman commented Mar 19, 2023 •

edited

Loading

tjohnman commented Mar 19, 2023 •

edited

Loading

Piezoid commented Mar 19, 2023

tjohnman commented Mar 19, 2023 •

edited

Loading

This comment was marked as outdated.

jarcen commented Mar 20, 2023 •

edited

Loading

tjohnman commented Mar 20, 2023

Reverse prompt is sometimes ignored. #292

Reverse prompt is sometimes ignored. #292

Comments

tjohnman commented Mar 19, 2023 • edited Loading

tjohnman commented Mar 19, 2023 • edited Loading

Piezoid commented Mar 19, 2023

tjohnman commented Mar 19, 2023 • edited Loading

This comment was marked as outdated.

jarcen commented Mar 20, 2023 • edited Loading

tjohnman commented Mar 20, 2023

tjohnman commented Mar 19, 2023 •

edited

Loading

tjohnman commented Mar 19, 2023 •

edited

Loading

tjohnman commented Mar 19, 2023 •

edited

Loading

jarcen commented Mar 20, 2023 •

edited

Loading