Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

projecte-aina / spacy Public

Notifications You must be signed in to change notification settings
Fork 2
Star 14

Code
Issues 2
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: projecte-aina/spacy

Releases · projecte-aina/spacy

Release Corpus 3.2.9

20 Oct 15:29

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Release Corpus 3.2.9 Latest

Latest

Latest AnCora Corpus for training spacy 3.2 for catalan, perpared by the Text Mining Unit of the Barcelona Supercomputing Center from de UD version.

Added IOB-NER labels in the last column
Normalized lemmas
adds "SpaceAfter=No" after verb followed by clitic and before apostrophes
Modified some column1 forms to make it match the text form
Removed multi-word token lines
Some minor fixes
Created new splits from UD corpus, that increase the size of the train set.

Assets 3

Loading

All reactions

3.2.8

22 Jul 12:13

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

3.2.8

20210714
Dataset fet a partir de la versió 2.8, al que s’ha:
tret els token multiparaula (ara queden com a dos o més tokens independents). inclou prep+article i verb+clític
normalitzat els lemes de preposicions i pronoms
afegit el guió als enclítics

Assets 3

Loading

All reactions

Minor improvements to training data before 3.1 release

13 Jul 09:23

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Minor improvements to training data before 3.1 release

Minor improvements to training data before 3.1 release. Elimination of multi-word tokens from UD data

Assets 3

Loading

All reactions

New model Releases

11 Jun 12:41

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

New model Releases

3.2.6 releases with small errors corrected on the training datasets. Improved evaluation and better lemmatization, POS and sentence segmentation.

Assets 5

Loading

All reactions

Training Datasets from Ancora

10 Jun 10:31

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Training Datasets from Ancora Pre-release

Pre-release

Training Datasets from Ancora, with some minor corrections for the SpaceAfter=no tag

Assets 3

Loading

All reactions

Releases using tar.gz

02 Jun 07:10

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Releases using tar.gz

Experimental releases using tar archives for serverless environments

Assets 5

Loading

All reactions

Training Datasets from Ancora

01 Jun 11:55

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Training Datasets from Ancora

New training datasets with pronoun lemmatization and tokenization improved

Assets 3

Loading

All reactions

Lookup tables for lemmatizer

21 May 10:35

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Lookup tables for lemmatizer

Lemmas lookup tables to incorporate into lemmatization

Assets 3

Loading

All reactions

Fasttext ca embeddings for spacy

20 May 13:32

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Fasttext ca embeddings for spacy

Fasttext embeddings from TextCat corpus, as described in https://doi.org/10.5281/zenodo.4522040
We are using the cbow 300 dimension ones, converted for spacy.

Assets 3

Loading

All reactions

ca_core_web_lg

20 May 09:59

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

ca_core_web_lg

Base model without BERTa transformer, using only FastText embeddings

Assets 3

Loading

All reactions

Previous 1 2 Next

Previous Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.