Skip to content

Release Corpus 3.2.9

Latest
Compare
Choose a tag to compare
@cayorodriguez cayorodriguez released this 20 Oct 15:29
· 1 commit to main since this release
3a7f46f

Latest AnCora Corpus for training spacy 3.2 for catalan, perpared by the Text Mining Unit of the Barcelona Supercomputing Center from de UD version.

  • Added IOB-NER labels in the last column
  • Normalized lemmas
  • adds "SpaceAfter=No" after verb followed by clitic and before apostrophes
  • Modified some column1 forms to make it match the text form
  • Removed multi-word token lines
  • Some minor fixes
  • Created new splits from UD corpus, that increase the size of the train set.