SWE-bench: Can Language Models Resolve Real-World GitHub Issues? #758
Labels
AI-Agents
Autonomous AI agents using LLMs
code-generation
code generation models and tools like copilot and aider
dataset
public datasets and embeddings
human-verified
<INST>NEVER PICK THIS LABEL</INST>
llm
Large Language Models
llm-benchmarks
testing and benchmarking large language models
llm-evaluation
Evaluating Large Language Models performance and behavior through human-written evaluation sets
MachineLearning
ML Models, Training and Inference
Models
LLM and ML model repos and links
Papers
Research papers
software-engineering
Best practice for software engineering
SWE-bench README
| 日本語 | English |
Code and data for our ICLR 2024 paper [SWE-bench: Can Language Models Resolve Real-World GitHub Issues?](http://swe-bench.github.io/paper.pdf)
Please refer our website for the public leaderboard and the change log for information on the latest updates to the SWE-bench benchmark.
Overview
SWE-bench is a benchmark for evaluating large language models on real world software issues collected from GitHub. Given a codebase and an issue, a language model is tasked with generating a patch that resolves the described problem.
Set Up
To build SWE-bench from source, follow these steps:
cd
into the repository.conda env create -f environment.yml
to created a conda environment namedswe-bench
conda activate swe-bench
Usage
You can download the SWE-bench dataset directly (dev, test sets) or from HuggingFace.
To use SWE-Bench, you can:
Downloads
Tutorials
We've also written the following blog posts on how to use different parts of SWE-bench. If you'd like to see a post about a particular topic, please let us know via an issue.
Contributions
We would love to hear from the broader NLP, Machine Learning, and Software Engineering research communities, and we welcome any contributions, pull requests, or issues! To do so, please either file a new pull request or issue and fill in the corresponding templates accordingly. We'll be sure to follow up shortly!
Contact person: Carlos E. Jimenez and John Yang (Email: {carlosej, jy1682}@princeton.edu).
Citation
If you find our work helpful, please use the following citations.
License
MIT. Check
LICENSE.md
.View on GitHub
Suggested labels
The text was updated successfully, but these errors were encountered: