Accepted to NAACL 2025
- Table Of Contents
- About The Repository
- Getting Started
- Data prepration
- Running experiments
- Inference
- Authors
- Citation
- License
This repository hosts the code pertaining to our paper AMPS: ASR with Multimodal Paraphrase Supervision accepted to NAACL 2025.
The main contribution of our paper is 🔎 a new technique AMPS
that augments a multilingual multimodal ASR system with paraphrase-based supervision for improved conversational ASR.
git clone https://github.com/csalt-research/amps-asr.git
cd amps-asr
fairseq2
Dependencies
cd fairseq2
pip install --editable .
seamless
dependencies:
cd seamless
pip install --editable .
If you encounter any issues while installing dependencies, refer to the Installation Guide.
You are all set! 🎉
Seamless needs dataset in a json
format. The dataset should be in the following structure:
{"source": {"id": "<ID>", "text": "<T2T-pipeline-input-text>", "lang": "<T2T-pipeline-input-language>", "audio_local_path": "<path-to-audio-file>", "sample_rate": <audio-sample-rate>, "waveform": null, "units": null}, "target": {"id": "<ID>", "text": "<ASR-pipeline-target-text>", "lang": "<ASR+T2T-pipeline-target-language>", "audio_local_path": null, "sample_rate": null, "waveform": null, "units": null, "paraphrase": "<T2T-pipeline-target-paraphrase>"}}
{...}
{...}
{...}
{...}
.
.
.
We have provided a sample dataset in the sample_data
folder
Our codebase has a simple, easily customizable script run.sh
, simply execute:
./run.sh s2t_loss_ratio t2t_loss_ratio loss_threshold
Note: The threshold is a tunable parameter that can help improve performance. By default, it is set to -1
, meaning no thresholding is applied.
For example to run only ASR finetuning without any thresholding, you can execute:
./run.sh 1 0 -1
To run AMPS with 3.2 threshold, you can execute:
./run.sh 1 1 3.2
After fine-tuning, the model will be saved in the directory $EXPERIMENT_DIR
.
We need to create a new .yaml
card (let's say custom_model.yaml
) for the newly fine-tuned model in:
- Copy the content of
BASE_MODEL.yaml
tocustom_model.yaml
. - Update the following fields:
- Model name: Change it to
custom_model
. - Checkpoint path: Set it to
$EXPERIMENT_DIR/$EXPERIMENT_NAME.pt
.
- Model name: Change it to
Specify the new model in the model_name
field when using the translator:
# Initialize a Translator object with a new model.
translator = Translator("custom_model", "vocoder_36langs", torch.device("cuda:0"), torch.float16)
# Predict
text_output, _ = translator.predict(
input=<path_to_input_audio>,
task_str="ASR",
tgt_lang=<tgt_lang>,
text_generation_opts=text_generation_opts,
unit_generation_opts=None
)
For more details on inference, visit here
- Abhishek Gupta - MTech, CSE, IIT Bombay - Abhishek Gupta
- Amruta Parulekar - DD, EE, IIT Bombay - Amruta Parulekar
- Sameep Chattopadhyay - DD, EE, IIT Bombay - Sameep Chattopadhyay
- Preethi Jyothi - Associate Professor, CSE, IIT Bombay - Preethi Jyothi
If you use this code for your research, please consider citing our work.
Distributed under the MIT License. See LICENSE for more information.