Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor document #93

Closed
wants to merge 23 commits into from
Closed

refactor document #93

wants to merge 23 commits into from

Conversation

ArneBinder
Copy link
Owner

@ArneBinder ArneBinder commented Mar 3, 2022

This PR cleans up and simplifies the Document.

It contains the following changes:

  • annotations and predictions as attributes of Document with type AnnotationCollection to reuse its functionality
  • add class AnnotationCollection as mapping from layer names to AnnotationLayers with some helper methods.
  • add class AnnotationLayer that derives from List, but with some sanity checks and typed getters (optional)
  • add construct_document that allows to easily create a Document from a text, spans and binary_relations (both as mapping from names to iterables of the respective annotations). This was previously sitting in tests.helpers.document_utils, but seems to be useful also in other places than tests.
  • fix pipeline test for TransformerRETextClassificationTaskModule: use pytest.approx to compare floats

@ArneBinder ArneBinder changed the title [WIP] refactor document refactor document Mar 3, 2022
@ArneBinder
Copy link
Owner Author

This should be ready now. @ChristophAlt could you have a look?

@ArneBinder ArneBinder requested a review from ChristophAlt March 3, 2022 22:02
@ArneBinder ArneBinder added the refactoring Refactoring label Mar 3, 2022
def has_layer(self, name: str) -> bool:
return name in self

def add(self, name: str, annotation: Optional[Annotation] = None, create_layer: bool = False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this to complicated and it will certainly confuse the user. Just automatically create the layer if it's not present and then there's also no need to make an annotation optional.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm good with removing the create_layer parameter. However, there is the use case where we want to add just an empty layer. Do you think it makes more sense to provide an add_layer method instead of having annotation optional?

@ChristophAlt
Copy link
Collaborator

ChristophAlt commented Mar 4, 2022

This is very interesting! I like it.

Two remarks:

  • the add method is too complicated and should be simplified. I don't see a reason why a layer shouldn't be created automatically if it doesn't exist and adding an annotation to a layer should obviously not make the annotation optional.
  • I honestly don't like the .as_{X} -- I know it's optional but let's be honest nobody is going to use it because it's quite unintuitive. I have to think a little bit about this one. Maybe we have to change our approach for how predictions and annotations are managed.

@ArneBinder
Copy link
Owner Author

Thanks for the feedback! I addressed your first point above above.

Regarding the .as_{X} getters in AnnotationLayer: I personally use the .as_{X} a lot. It is much better readable then adding cast(List[X], doc.annotations[xxx]) or #type: ignore which is even more annoying (often, the assigned variable has to be typed again in theses cases). Example:

entities = doc.annotations[self.entity_annotation].as_spans

vs.

entities = cast(List[SpanAnnotation], doc.annotations[self.entity_annotation])
# or
entities: List[SpanAnnotation] = doc.annotations[self.entity_annotation]  #type: ignore

If AnnotationLayer is to complicated in your opinion, I would be fine with stripping the internal type checking (every usage of _check_type) away, however, I would really like to have this (optional) functionality.

@ChristophAlt
Copy link
Collaborator

ChristophAlt commented Mar 4, 2022

It's not about the usage of as_{X} -- it's certainly convenient given the necessity for the casts. The problem is that the casts are necessary in the first place, which might suggest our current approach of document annotations and predictions is flawed.

The same goes for the necessity of adding empty layers to control certain taskmodule behavior -- this doesn't seem the proper way to do it.

This is not a critique of your implementation but highlights some missing pieces in the current implementation.

…of new AnnotationCollection.create_layer method
@ArneBinder ArneBinder mentioned this pull request Mar 4, 2022
@ArneBinder
Copy link
Owner Author

ArneBinder commented Mar 4, 2022

@ChristophAlt I just implemented another version of this which uses explicit types: #95. I like the other one more, what do you think?

@ArneBinder
Copy link
Owner Author

This is discarded in favor of #95.

@ArneBinder ArneBinder closed this Mar 4, 2022
@ChristophAlt ChristophAlt deleted the refactor/document branch May 3, 2022 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants