Latest AnCora Corpus for training spacy 3.2 for catalan, perpared by the Text Mining Unit of the Barcelona Supercomputing Center from de UD version.
- Added IOB-NER labels in the last column
- Normalized lemmas
- adds "SpaceAfter=No" after verb followed by clitic and before apostrophes
- Modified some column1 forms to make it match the text form
- Removed multi-word token lines
- Some minor fixes
- Created new splits from UD corpus, that increase the size of the train set.