integration of automatic translation evaluation into model evaluation tools #32

kangsuhyun-yanolja · 2024-02-19T05:17:15Z

Currently, there is a need for an automated evaluation tool that can simplify the process. This tool should be capable of assessing the accuracy and quality of translations produced by various models. A potential solution could involve integrating this functionality into an existing framework like lm_evaluation_harness or creating a standalone service.
This service could accept inputs in formats such as CSV or JSONL, providing users with a straightforward method to obtain evaluations. Results from this tool could be essential for models aiming to participate in Arena.

kangsuhyun-yanolja · 2024-02-19T05:18:52Z

@hist0613 Hello. May I ask how could we run the automatic evaluation of translation models?

hist0613 · 2024-02-19T07:13:40Z

@kangsuhyun-yanolja

My work is evaluating the model's translations based on the given references (gold translations), which I called thus reference-based.
You can refer to the repo (yanolja-org/iab-eval-translation)
Among the files, it is recommended to look run_evaluation.py. It works as follows:
a. It supposes two translation files, one is for gold translation, and the other for the translation system to be evaluated, as in ./translations/ai-hub-ko-en
b. You can evaluate a given translation file (./translations/ai-hub-ko-en/deepl.jsonl) for a given metric (such as BLEU), as seen in this shell script ./scripts/evaluate.sh
c. You can see the evaluation results at ./results/eval_results.json by default, or at ./results/eval_results.md when you runned the visualization script together.

kangsuhyun-yanolja · 2024-02-20T00:59:46Z

@hist0613 Thank you for the detailed explanation!

kimsooyeon-yanolja · 2024-03-21T08:35:04Z

@kangsuhyun-yanolja
It would be nice to expand some function within the translation part.
To enhance the usability of translation UI, I think that (1) adding an alert function and (2) a reset button.

Adding alert func
1-1) If the source language and the target language are the same (if the value of the drop-down is the same)
[Alert] 'Source language and target language are the same.'

1-2) If the source code is different from the sentence in the Prompt
[Alert] 'The language specified by the source code is different.'
Set the reset button to allow multiple attempts
: It looks like you need a separate button to clear the prompt or try Run again.

cc. @seungduk-yanolja

kangsuhyun-yanolja · 2024-03-21T08:51:44Z

Thank you for the comment! Especially the 1-2 item, I think we need to handle 1-2 now. We're using a language detector so it would be better to believe it. Then users won't have to select two options. I'll create an issue about it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integration of automatic translation evaluation into model evaluation tools #32

integration of automatic translation evaluation into model evaluation tools #32

kangsuhyun-yanolja commented Feb 19, 2024

kangsuhyun-yanolja commented Feb 19, 2024

hist0613 commented Feb 19, 2024 •

edited

Loading

kangsuhyun-yanolja commented Feb 20, 2024

kimsooyeon-yanolja commented Mar 21, 2024

kangsuhyun-yanolja commented Mar 21, 2024

integration of automatic translation evaluation into model evaluation tools #32

integration of automatic translation evaluation into model evaluation tools #32

Comments

kangsuhyun-yanolja commented Feb 19, 2024

kangsuhyun-yanolja commented Feb 19, 2024

hist0613 commented Feb 19, 2024 • edited Loading

kangsuhyun-yanolja commented Feb 20, 2024

kimsooyeon-yanolja commented Mar 21, 2024

kangsuhyun-yanolja commented Mar 21, 2024

hist0613 commented Feb 19, 2024 •

edited

Loading