[games] Open questions on private/shared game #71

briemadu · 2024-03-26T09:17:10Z

Some decisions about the privateshared game that could be improved in future versions:

Handling continuations

In an internal meeting, we decided to abort games if models produce continuations beyond the first line break. This was implemented in c9785f9. Although closed models seem to only generate the short responses we want anyway, open-weight models that generate up to an imposed maximum token length will of course fail a priori. Still, some measure to ignore fabricated continuations of upcoming turns is required, otherwise the full hypothetical continuation gets appended to the history. Those continuations may contain wrong tags, wrong filled slots etc that break the conversational grounding structure we need.

Reprompts in (slot filling) turns

Currently, we only do reprompting in the probing rounds. If the model fails to follow the format rules in the main slot filling rounds, we abort immediately. Reprompting can also be implemented in slot filling turns.

Assumption of shared, even if wrong, values

Right now, the game master assumes that, if the correct value is not contained in the model's response to a slot filling question, its value still turns to shared. The rationale is that a wrong shared value is still a shared value; being wrong is penalised in the slot-filling metric, but that does not prevent the probing rounds to be performed accordingly. However, if the questioner asks e.g. When are you travelling? and the model answers I don't know, no value is shared. So we may need better strategies to handle this issue in the future.

The text was updated successfully, but these errors were encountered:

Gnurro · 2024-03-27T15:58:04Z

I think the biggest issue here is that we can't know the sampling and postprocessing done for the replies from proprietary models.
If we knew, we could set up the same for the ope-weight models we run - assuming that the proprietary models are even using the same or similar settings between different vendors.

For the HuggingFace transformers backend, there are generation options that I am very sure are in use in similar ways for the proprietary models, which we could leverage.
The first option, which is already exposed as an argument passed to the Player class instance call, is the maximum generation length. Passing a number lower than the (also arbitary) default of 100 tokens could limit the amount of spurious, unwanted tokens generated.
The second option is to set additional stop tokens, the tokens decoding to contain newlines, for example. The only default token transformers is the default EOS token for the respective model - but through prompt templating and model training, many of the open-weight models rarely generate the EOS token either.

Gnurro · 2024-04-05T13:36:34Z

I've looked into the continuation issue: https://github.com/Gnurro/clembench-runs/tree/continuation_check
Just sharing the responses if anyone wants to check them directly, but I'll be talking about it in the next meeting,

davidschlangen changed the title ~~[privateshared] Open questions~~ [games] Open questions on private/shared game Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[games] Open questions on private/shared game #71

[games] Open questions on private/shared game #71

briemadu commented Mar 26, 2024

Gnurro commented Mar 27, 2024

Gnurro commented Apr 5, 2024

[games] Open questions on private/shared game #71

[games] Open questions on private/shared game #71

Comments

briemadu commented Mar 26, 2024

Handling continuations

Reprompts in (slot filling) turns

Assumption of shared, even if wrong, values

Gnurro commented Mar 27, 2024

Gnurro commented Apr 5, 2024