Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[games] Open questions on private/shared game #71

Open
briemadu opened this issue Mar 26, 2024 · 2 comments
Open

[games] Open questions on private/shared game #71

briemadu opened this issue Mar 26, 2024 · 2 comments

Comments

@briemadu
Copy link
Contributor

Some decisions about the privateshared game that could be improved in future versions:

Handling continuations

In an internal meeting, we decided to abort games if models produce continuations beyond the first line break. This was implemented in c9785f9. Although closed models seem to only generate the short responses we want anyway, open-weight models that generate up to an imposed maximum token length will of course fail a priori. Still, some measure to ignore fabricated continuations of upcoming turns is required, otherwise the full hypothetical continuation gets appended to the history. Those continuations may contain wrong tags, wrong filled slots etc that break the conversational grounding structure we need.

Reprompts in (slot filling) turns

Currently, we only do reprompting in the probing rounds. If the model fails to follow the format rules in the main slot filling rounds, we abort immediately. Reprompting can also be implemented in slot filling turns.

Assumption of shared, even if wrong, values

Right now, the game master assumes that, if the correct value is not contained in the model's response to a slot filling question, its value still turns to shared. The rationale is that a wrong shared value is still a shared value; being wrong is penalised in the slot-filling metric, but that does not prevent the probing rounds to be performed accordingly. However, if the questioner asks e.g. When are you travelling? and the model answers I don't know, no value is shared. So we may need better strategies to handle this issue in the future.

@Gnurro
Copy link
Collaborator

Gnurro commented Mar 27, 2024

I think the biggest issue here is that we can't know the sampling and postprocessing done for the replies from proprietary models.
If we knew, we could set up the same for the ope-weight models we run - assuming that the proprietary models are even using the same or similar settings between different vendors.

For the HuggingFace transformers backend, there are generation options that I am very sure are in use in similar ways for the proprietary models, which we could leverage.
The first option, which is already exposed as an argument passed to the Player class instance call, is the maximum generation length. Passing a number lower than the (also arbitary) default of 100 tokens could limit the amount of spurious, unwanted tokens generated.
The second option is to set additional stop tokens, the tokens decoding to contain newlines, for example. The only default token transformers is the default EOS token for the respective model - but through prompt templating and model training, many of the open-weight models rarely generate the EOS token either.

@Gnurro
Copy link
Collaborator

Gnurro commented Apr 5, 2024

I've looked into the continuation issue: https://github.com/Gnurro/clembench-runs/tree/continuation_check
Just sharing the responses if anyone wants to check them directly, but I'll be talking about it in the next meeting,

@davidschlangen davidschlangen changed the title [privateshared] Open questions [games] Open questions on private/shared game Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants