Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integration of automatic translation evaluation into model evaluation tools #32

Open
kangsuhyun-yanolja opened this issue Feb 19, 2024 · 5 comments

Comments

@kangsuhyun-yanolja
Copy link
Collaborator

Currently, there is a need for an automated evaluation tool that can simplify the process. This tool should be capable of assessing the accuracy and quality of translations produced by various models. A potential solution could involve integrating this functionality into an existing framework like lm_evaluation_harness or creating a standalone service.
This service could accept inputs in formats such as CSV or JSONL, providing users with a straightforward method to obtain evaluations. Results from this tool could be essential for models aiming to participate in Arena.

@kangsuhyun-yanolja
Copy link
Collaborator Author

@hist0613 Hello. May I ask how could we run the automatic evaluation of translation models?

@hist0613
Copy link

hist0613 commented Feb 19, 2024

@kangsuhyun-yanolja

  1. My work is evaluating the model's translations based on the given references (gold translations), which I called thus reference-based.
  2. You can refer to the repo (yanolja-org/iab-eval-translation)
  3. Among the files, it is recommended to look run_evaluation.py. It works as follows:
    a. It supposes two translation files, one is for gold translation, and the other for the translation system to be evaluated, as in ./translations/ai-hub-ko-en
    b. You can evaluate a given translation file (./translations/ai-hub-ko-en/deepl.jsonl) for a given metric (such as BLEU), as seen in this shell script ./scripts/evaluate.sh
    c. You can see the evaluation results at ./results/eval_results.json by default, or at ./results/eval_results.md when you runned the visualization script together.

@kangsuhyun-yanolja
Copy link
Collaborator Author

@hist0613 Thank you for the detailed explanation!

@kimsooyeon-yanolja
Copy link

@kangsuhyun-yanolja
It would be nice to expand some function within the translation part.
To enhance the usability of translation UI, I think that (1) adding an alert function and (2) a reset button.

  1. Adding alert func
    1-1) If the source language and the target language are the same (if the value of the drop-down is the same)
    [Alert] 'Source language and target language are the same.'
    image

    1-2) If the source code is different from the sentence in the Prompt
    [Alert] 'The language specified by the source code is different.'
    image

  2. Set the reset button to allow multiple attempts
    : It looks like you need a separate button to clear the prompt or try Run again.

cc. @seungduk-yanolja

@kangsuhyun-yanolja
Copy link
Collaborator Author

Thank you for the comment! Especially the 1-2 item, I think we need to handle 1-2 now. We're using a language detector so it would be better to believe it. Then users won't have to select two options. I'll create an issue about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants