In this repository, We present the implementation of our two poposed semi-supervised approches CSS and PSS for BLI.
- python 3.7
- Pytorch
- Numpy
- Faiss
You need to download the MUSE dataset from here to the ./muse_data directory.
You need to download the VecMap dataset from here to the ./vecmap_data directory.
You can run the following command to evaluate CSS on the MUSE dataset with "5k all" annotated lexicon:
python main.py --config_file ./configs/config-CSS-muse-en-es-5kall.yaml
You can run the following command to evaluate PSS on the VecMap dataset with "5k all" annotated lexicon:
python main.py --config_file ./configs/config-PSS-vecmap-en-es-5kall.yaml
Then we briefly discribe some important fields in the configuration file:
- "method"" specifies the model to evaludate. "CSSBli" for CSS or "PSSBli" for PSS.
- "src" and "tgt" indicate the source and target languages of BLI task.
- "data_params/data_dir" specifies which dataset to use where "./muse_data/" for MUSE or "./vecmap_data/" for VevMap.
- "supervised/max_count" indicates the size of annotated lexicon where "-1" for "5k all", "100" for "100 unique" and "5000" for "5000 unique".
Other fields specify the hyperparameters for CSS and PSS.