Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatible with spacy v2.2.3? #18

Closed
mcswell opened this issue May 21, 2020 · 10 comments
Closed

Incompatible with spacy v2.2.3? #18

mcswell opened this issue May 21, 2020 · 10 comments

Comments

@mcswell
Copy link

mcswell commented May 21, 2020

I have spacy v2.1.9 installed on one machine, and 2.2.3 (the current latest version) on another. I installed spacy-ru on both, but it only runs well on the 2.1.9 machine. On the 2.2.3 machine, when I do the
doc=nlp(s)
step (with s=Russian text), I get the error

doc=nlp(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib64/python3.6/site-packages/spacy/language.py", line 435, 
in __call__
doc = proc(doc, **component_cfg.get(name, {}))
File "pipes.pyx", line 397, in spacy.pipeline.pipes.Tagger.__call__
File "pipes.pyx", line 442, in spacy.pipeline.pipes.Tagger.set_annotations
File "morphology.pyx", line 312, in spacy.morphology.Morphology.assign_tag_id
File "morphology.pyx", line 200, in spacy.morphology.Morphology.add
ValueError: [E167] Unknown morphological feature: 'Person' (2313063860588076218). 
This can happen if the tagger was trained with a different set of morphological features. 
If you're using a pretrained model, make sure that your models are up to date:
python -m spacy validate

I guess I could build spacy-ru from source and maybe this would solve the problem, but I'm not sure I'm up to that. What I did instead was to uninstall version 2.2.3 of spacy, and install version 2.1.9 in its place, so now spacy-ru works on both machines.

But I'd rather be using the current version of spacy, which I use for a couple other languages as well. (Even better, I'd like spacy-ru to be immune to version changes in spacy, but I suppose that's asking a bit much :-).)

Is there a (simple) way to make spacy-ru compatible with v2.2 of spacy?

@buriy
Copy link
Owner

buriy commented May 22, 2020

This is a bug in spacy, that it doesn't allow numerical features in the Syntagrus dataset used for training ("Person=1", "Person=2", "Person=3").
I have a version with this tag changed (to "Person=first" etc), that will work with 2.2 branch correctly.
I'll prepare and upload it early next week.
You can also do this change in the dataset and train it yourself in several hours (see Makefile).
Just I'm preparing a version with vectors properly integrated and that should improve resulting POS and DEP quality a little bit.

@buriy
Copy link
Owner

buriy commented May 22, 2020

And btw the latest version is 2.2.4 https://pypi.org/project/spacy/#history :)

@mcswell
Copy link
Author

mcswell commented May 22, 2020

Thank you for the quick reply! I don't have a GPU (at least not one that works for ML), so I guess I'll wait until next week.

And I wish I could speak Russian like you do English :-)

@mcswell
Copy link
Author

mcswell commented May 23, 2020 via email

@buriy
Copy link
Owner

buriy commented May 23, 2020

Oh, you're right. SpaCy has some re-capitalization for the lemmas, so I will need to do the same in the Russian version. Thanks for noting, somehow I missed it completely.
Please note that in SpaCy this behavior is inconsistent and depends on whether the POS tagger was used, etc.
How it works: there's a shape flag in each token (token.shape), which can be Xxx, XXX, xxx and so on, which is then used to restore the capitalization. Only very rare words are capitalized like spaCy -- they will be updated to what shape does display for them.

@lexmosolov
Copy link

Will "ru2" work well with version 2.3.0?

@mcswell
Copy link
Author

mcswell commented Jun 24, 2020

I've installed spacy v2.3.0:
>>> spacy.__version__ '2.3.0'
When I load the existing version of ru2 using
nlp = spacy.load(<localFile)
I get a warning that

Model 'ru_model' (0.2) requires spaCy v2.1 and is incompatible with the current spaCy version (2.3.0).

And when I try to use nlp(<RussianSentence>), I get the error:

Traceback (most recent call last):
File "<stdin>", line 1, in
File "/usr/local/lib/python3.8/dist-packages/spacy/language.py", line 446, in call
doc = proc(doc, **component_cfg.get(name, {}))
File "pipes.pyx", line 398, in spacy.pipeline.pipes.Tagger.call
File "pipes.pyx", line 443, in spacy.pipeline.pipes.Tagger.set_annotations
File "morphology.pyx", line 315, in spacy.morphology.Morphology.assign_tag_id
File "morphology.pyx", line 203, in spacy.morphology.Morphology.add
ValueError: [E167] Unknown morphological feature: 'Person' (2313063860588076218).
This can happen if the tagger was trained with a different set of morphological features.
If you're using a pretrained model, make sure that your models are up to date:
python -m spacy validate

So it looks like the answer is no.

@buriy
Copy link
Owner

buriy commented Jun 25, 2020

We'll have a version for Spacy 2.2 and Spacy 2.3 on Monday.

@gonzagazzz
Copy link

Looking forward to the 2.3 support!

@buriy
Copy link
Owner

buriy commented Jul 10, 2020

I've just published SynTagRus-based POS & DEP model for 2.3 right now, but a NER and MIT-licensed POS & DEP is going on to be published several days later.
https://github.com/buriy/spacy-ru/releases/tag/v2.3_pre1

How to use it: unpack into your project root folder, then

import ru2_syntagrus
ru2_syntagrus.load_ru2('path_to/ru2_syntagrus')

Or you could just use spacy.load('path_to/ru2_syntagrus/') but then lemmas will be a bit worse.

@buriy buriy closed this as completed Jul 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants