Skip to content

Latest commit

 

History

History
31 lines (20 loc) · 1.29 KB

README.md

File metadata and controls

31 lines (20 loc) · 1.29 KB

nn-depparser

A re-implementation of nndep using PyTorch.

Currently used for training CoreNLP dependency parsing models.

Requires Stanza for some features (auto-tagging with CoreNLP via server).

Originally by Danqi Chen. Leave a GitHub Issue if you have any questions!

Example Usage

Train a model:

python train.py -l universal -d /path/to/data --train_file it-train.conllu --dev_file it-dev.conllu --test_file it-test.conllu --embedding_file /path/to/it-embeddings.txt --embedding_size 100 --random_seed 21 --learning_rate .005 --l2_reg .01 --epsilon .001 --optimizer adamw --save_path /path/to/experiment-dir --job_id experiment-name --corenlp_tags --corenlp_tag_lang italian --n_epoches 2000

Note that the above command will automatically tag the input data with the CoreNLP tagger. Thus you need to have CoreNLP and the Italian models (for this example) in your CLASSPATH, and you need the latest version of Stanza installed.

Why is this done? When CoreNLP runs a dependency parser, it relies on part of speech tags, so the training and development data used during training need to have the predicted tags CoreNLP will use for optimal performance.

Convert to CoreNLP format:

python gen_model.py -o /path/to/italian-corenlp-parser.txt /path/to/experiment-dir/experiment-name