Source code of paper: A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models.
- Clone this repository
git clone https://github.com/THUDM/MoELoRA_Riemannian.git
- Install dependencies
conda create -n moelora_riemannian python=3.10 -y conda activate moelora_riemannian pip install -r requirements.txt
-
Prepare the datasets by this script:
python prepare_data.py \ --save_path "./dataset/scienceqa"
-
Organize your datasets in the following structure:
MoELoRA_Riemannian/dataset/ ├── scienceqa/ │ ├── science_qa.hf │ ├── scienceqa_train.json │ ├── scienceqa_test.json │ └── ... └── ...
# CUDA_VISIBLE_DEVICES=[GPU ID] python -m torch.distributed.launch --nproc_per_node 1 [TRAINING_SCRIPT] [DATASET] [OPTIMIZER] [METHOD]
# train MoE-LoRA with per-expert classic Riemannian preconditioners (the SGD optimizer)
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 train_llama.py ScienceQA sgd riemannian
# train MoE-LoRA with per-expert classic Riemannian preconditioners (the AdamW optimizer)
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 train_llama.py ScienceQA adamw riemannian
# train MoE-LoRA with our method (the SGD optimizer)
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 train_llama.py ScienceQA sgd ourmethod
# train MoE-LoRA with our method (the AdamW optimizer)
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 train_llama.py ScienceQA adamw ourmethod