Re-implementation of Sanyuan-Chen/RecAdam.
- (New!)
NeurIPS_2024
branch (2024-12)
- Simpler interfaces with less tuning parameters.
- Compatible with deepspeed.
pip install git+https://github.com/hitachi-nlp/rec-adam.git@NeurIPS_2024
from rec_adam import build_rec_adam_optimizer
model = (...) # load your model, such as llama
optimizer = build_rec_adam_optimizer(
model,
learning_rate=1e-05,
fisher_coef=2000,
)
The loss will become something like loss = loss_original + target_task_weight * (fisher_coef * l2_term)
.
Note that target_task_weight
works differently from the original implementation,
where the loss is somthing like loss = (1 - target_task_weight) * loss_original + (...)
.
fisher_coef
should be tuned for each model and task.
Note that the default value of 2000 is good for training LLaMA-3.1-8B on FLD2.
from rec_adam import RecAdamTrainer
trainer = RecAdamTrainer(Trainer):
training_args,
rec_adam_fisher_coef=2000,
):
We do not recommend to initialize the optimizer directly using its constructor, as setting the suitable arguments is complex.
Although, you can do it like this:
from rec_adam import RecAdam
optimizer = RecAdam(...)