Skip to content

csalt-research/amps-asr

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSALT @ IITB

AMPS: ASR with Multimodal Paraphrase Supervision

Accepted to NAACL 2025

Table Of Contents

About The Repository

This repository hosts the code pertaining to our paper AMPS: ASR with Multimodal Paraphrase Supervision accepted to NAACL 2025.

The main contribution of our paper is 🔎 a new technique AMPS that augments a multilingual multimodal ASR system with paraphrase-based supervision for improved conversational ASR.

Getting Started

1. Clone the Repository

git clone https://github.com/csalt-research/amps-asr.git
cd amps-asr

2. Install the dependencies:

fairseq2 Dependencies

cd fairseq2
pip install --editable .

seamless dependencies:

cd seamless
pip install --editable .

3. Troubleshooting

If you encounter any issues while installing dependencies, refer to the Installation Guide.

You are all set! 🎉

 


Data prepration

Seamless needs dataset in a json format. The dataset should be in the following structure:

{"source": {"id": "<ID>", "text": "<T2T-pipeline-input-text>", "lang": "<T2T-pipeline-input-language>", "audio_local_path": "<path-to-audio-file>", "sample_rate": <audio-sample-rate>, "waveform": null, "units": null}, "target": {"id": "<ID>", "text": "<ASR-pipeline-target-text>", "lang": "<ASR+T2T-pipeline-target-language>", "audio_local_path": null, "sample_rate": null, "waveform": null, "units": null, "paraphrase": "<T2T-pipeline-target-paraphrase>"}}
{...}
{...}
{...}
{...}
.
.
.

We have provided a sample dataset in the sample_data folder

Running experiments

Our codebase has a simple, easily customizable script run.sh, simply execute:

./run.sh s2t_loss_ratio t2t_loss_ratio loss_threshold

Note: The threshold is a tunable parameter that can help improve performance. By default, it is set to -1, meaning no thresholding is applied.

For example to run only ASR finetuning without any thresholding, you can execute:

./run.sh 1 0 -1

To run AMPS with 3.2 threshold, you can execute:

./run.sh 1 1 3.2

Inference

After fine-tuning, the model will be saved in the directory $EXPERIMENT_DIR.
We need to create a new .yaml card (let's say custom_model.yaml) for the newly fine-tuned model in:

Steps to create custom_model.yaml

  1. Copy the content of BASE_MODEL.yaml to custom_model.yaml.
  2. Update the following fields:
    • Model name: Change it to custom_model.
    • Checkpoint path: Set it to $EXPERIMENT_DIR/$EXPERIMENT_NAME.pt.

Using the new model for inference:

Specify the new model in the model_name field when using the translator:

# Initialize a Translator object with a new model.
translator = Translator("custom_model", "vocoder_36langs", torch.device("cuda:0"), torch.float16)

# Predict
text_output, _ = translator.predict(
    input=<path_to_input_audio>,
    task_str="ASR",
    tgt_lang=<tgt_lang>,
    text_generation_opts=text_generation_opts,
    unit_generation_opts=None
)

For more details on inference, visit here

Authors

Citation

If you use this code for your research, please consider citing our work.

License

Distributed under the MIT License. See LICENSE for more information.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 66.0%
  • Python 15.0%
  • C 10.0%
  • C++ 5.1%
  • Cuda 1.9%
  • Metal 0.6%
  • Other 1.4%