-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a test to check that Evaluator evaluations match transformers examples #163
Add a test to check that Evaluator evaluations match transformers examples #163
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Hi @fxmarty thanks for working on this! That does indeed look useful, kind of a compatibility test between |
I have been thinking about this some more: I think although this is a great tool for debugging the |
Agreed it's probably a bit overkill, although checking against PyTorch examples makes sure that in the future we continue to match Trainer / PyTorch scripts examples behavior. If you would prefer to have a fixed value in the test, let me know and I will edit accordingly! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with @LysandreJik I think we can add it. It will be an interesting signal if this fails. Thanks a lot for adding this! I left a few minor comments, let me know if you have any questions.
I don't think we need subfolders at this point and you can just create a new file in tests
(e.g. test_trainer_evaluator_parity.py
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚀
Basically just to check that
Trainer
andEvaluator
do return the same evaluations. This was useful for me working on Optimum to debug slight differences between evaluations with pipelines vs.Trainer
(https://github.com/fxmarty/optimum/tree/tests-benchmark/tests/benchmark/onnxruntime), where depending on the task you need to be careful with the kwargs passed to the pipeline to match the output.Let me know if you think such tests for each tasks are useful or not. If not, I will just add
TokenClassificationEvaluator
,QuestionAnsweringEvaluator
andImageProcessingEvaluator
without the tests.