Semi-Supervised Bilingual Lexicon Induction with Two-Way Message Passing Mechanisms

In this repository, We present the implementation of our two poposed semi-supervised approches CSS and PSS for BLI.

Dependencies

You need to download the MUSE dataset from here to the ./muse_data directory.

You need to download the VecMap dataset from here to the ./vecmap_data directory.

You can run the following command to evaluate CSS on the MUSE dataset with "5k all" annotated lexicon:

python main.py --config_file ./configs/config-CSS-muse-en-es-5kall.yaml

You can run the following command to evaluate PSS on the VecMap dataset with "5k all" annotated lexicon:

python main.py --config_file ./configs/config-PSS-vecmap-en-es-5kall.yaml

Then we briefly discribe some important fields in the configuration file:

"method"" specifies the model to evaludate. "CSSBli" for CSS or "PSSBli" for PSS.
"src" and "tgt" indicate the source and target languages of BLI task.
"data_params/data_dir" specifies which dataset to use where "./muse_data/" for MUSE or "./vecmap_data/" for VevMap.
"supervised/max_count" indicates the size of annotated lexicon where "-1" for "5k all", "100" for "100 unique" and "5000" for "5000 unique".

Other fields specify the hyperparameters for CSS and PSS.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
IO		IO
configs		configs
evaluation		evaluation
model		model
sinkhorn		sinkhorn
README.md		README.md
main.py		main.py
utils.py		utils.py