Run RL
First follow instructions in verl to install the main repo, then locally install this repo.
git clone https://github.com/koalazf99/nanoverl.git nanoverl
cd nanoverl
pip install -e .
All scripts for RL experiments are in nanoverl/example/
. For example, we can run the following script to train deepscaler dataset using R1-Distill-Qwen-1.5B with GRPO algorithm:
cd examples/deepscaler
python prepare_dataset.py
bash train_grpo_r1_distill_1b_8k.bash
The evaluation script is also a "nano" version thanks to sglang. We use sglang-router to serve multiple backends.
python -m sglang_router.launch_server \
--model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
--port 30000 --dp-size 8
python reasoning_eval.py \
--data-path nanoverl/aime \
--parallel 256 \
--num-tries 16
pip install poetry
poetry init
poetry build