You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to better cater to the needs of our community, we plan to incorporate a range of diverse NLP datasets into the platform. These datasets will cover various domains such as text classification, natural language inference, coreference resolution, semantic matching, question answering, text generation, and more. This enhancement aims to provide a more comprehensive and in-depth assessment of models.
Below is the list of NLP datasets we intend to add, along with their respective categories, datasets with ⭐️ are Highly Demanded:
We warmly welcome every member of the OpenCompass community to actively participate and collaborate in seamlessly integrating these datasets into our evaluation platform. Here's how you can get involved:
Choose Datasets: In the comments, let us know which datasets you're interested in helping to add, or provide suggestions for existing ones.
Dataset Information: If you're aware of relevant links, descriptions, licensing details, etc., please do share them as it will greatly aid our integration efforts. Here's a template to follow:
Name: GSM8k
Link: [https://github.com/openai/grade-school-math](https://github.com/openai/grade-school-math)
Introduction: GSM8K is a grade school math question answering task, which requires selecting the most reasonable solution based on the given scenario and two possible solutions. The dataset consists of 16k training samples, 800 development samples, and 2k test samples, all in English.
Sample:
{
"question": "If f(x) = x^2 + 3x - 2, what is f(-1)?",
"answer": "f(-1) = (-1)^2 + 3(-1) - 2\nf(-1) = 1 - 3 - 2\nf(-1) = -4"
}
License: MIT License
New Dataset Suggestions: If you have other suitable dataset recommendations, feel free to share your insights in the comments.
How to add?
Adding a new dataset involves several steps:
Documentation: Please visit the documentation which provides a step-by-step guide on how to add a new dataset.
Check Input & Output: Once your new dataset config is ready, use the Prompt Viewer tool in OpenCompass to easily check the Input & Output.
Preparation: Follow the documentation and write code, then create the corresponding Pull Request. If you are not familiar with Pull Requests, this Contribution Guide may help you :>
Description: In your Pull Request, provide a detailed description of the datasets you intend to add, along with the relevant links, descriptions, licensing information, and the bilingual content you shared earlier in this issue. Submit your Pull Request. Our community reviewers will then assess your contribution and provide feedback.
By following these steps, you can actively contribute to enriching the OpenCompass evaluation platform with new and valuable datasets. If you encounter any issues during the process or need further assistance, feel free to ask. We appreciate your dedication to making OpenCompass a more diverse and comprehensive resource for NLP model evaluation. Happy contributing!
是否希望自己实现该功能?
我希望自己来实现这一功能,并向 OpenCompass 贡献代码!
The text was updated successfully, but these errors were encountered:
liushz
changed the title
[Feature] Collaborate to Enhance the OpenCompass : Introducing Diverse NLP Dataset
🔥 Collaborate to Enhance the OpenCompass : Introducing Diverse NLP Dataset
Aug 23, 2023
描述该功能
Hello everyone!
In order to better cater to the needs of our community, we plan to incorporate a range of diverse NLP datasets into the platform. These datasets will cover various domains such as text classification, natural language inference, coreference resolution, semantic matching, question answering, text generation, and more. This enhancement aims to provide a more comprehensive and in-depth assessment of models.
Below is the list of NLP datasets we intend to add, along with their respective categories, datasets with ⭐️ are Highly Demanded:
Text Classification:
Natural Language Inference:
Coreference Resolution:
Semantic Matching:
General Question Answering:
Multi-Turn Question Answering:
Knowledge Question Answering:
Reasoning Question Answering:
Safety, Ethics, and Morality:
Text Generation:
We warmly welcome every member of the OpenCompass community to actively participate and collaborate in seamlessly integrating these datasets into our evaluation platform. Here's how you can get involved:
How to add?
Adding a new dataset involves several steps:
Documentation: Please visit the documentation which provides a step-by-step guide on how to add a new dataset.
Check Input & Output: Once your new dataset config is ready, use the Prompt Viewer tool in OpenCompass to easily check the Input & Output.
Preparation: Follow the documentation and write code, then create the corresponding Pull Request. If you are not familiar with Pull Requests, this Contribution Guide may help you :>
Description: In your Pull Request, provide a detailed description of the datasets you intend to add, along with the relevant links, descriptions, licensing information, and the bilingual content you shared earlier in this issue. Submit your Pull Request. Our community reviewers will then assess your contribution and provide feedback.
By following these steps, you can actively contribute to enriching the OpenCompass evaluation platform with new and valuable datasets. If you encounter any issues during the process or need further assistance, feel free to ask. We appreciate your dedication to making OpenCompass a more diverse and comprehensive resource for NLP model evaluation. Happy contributing!
是否希望自己实现该功能?
The text was updated successfully, but these errors were encountered: