Release Corpus 3.2.9

Latest

Latest

cayorodriguez released this 20 Oct 15:29

· 1 commit to main since this release

3a7f46f

Latest AnCora Corpus for training spacy 3.2 for catalan, perpared by the Text Mining Unit of the Barcelona Supercomputing Center from de UD version.

Added IOB-NER labels in the last column
Normalized lemmas
adds "SpaceAfter=No" after verb followed by clitic and before apostrophes
Modified some column1 forms to make it match the text form
Removed multi-word token lines
Some minor fixes
Created new splits from UD corpus, that increase the size of the train set.

Assets 3