exclude spans that are out of valid window #128

ArneBinder · 2022-04-03T15:12:32Z

When doing inference with a TransformerTokenClassificationTaskModule that was trained with windowing and an overlap, it will also produce spans from the output of the model that is just the additional context, i.e. the part that overlaps. However, this is not recommended since the model was not trained to produce correct predictions for this part (just the non-overlapping part was taken into account to use every part of the input only "once" for training).

This PR excludes spans that are completely out of the valid window (the not overlapping part). However, we still take predicted spans into account, that are only partly in the valid window because they would get lost otherwise. But this has to be taken into account for further post-processing steps since it may produce the same span multiple times (i.e. from the end of one window and from the beginning of the next one). Any de-duplication is not handled here.

…ut of valid span

… test to contain a span that is only partly in valid window

ArneBinder added 3 commits April 1, 2022 18:27

exclude spans in create_annotations_from_output that are completely o…

47fce90

…ut of valid span

implement utils.span.has_overlap

23c8ff7

use utils.span.has_overlap to exclude spans outside the valid window

9a3920b

ArneBinder requested a review from ChristophAlt April 3, 2022 15:12

ArneBinder changed the title ~~exclude spans that are out of window~~ exclude spans that are out of valid window Apr 3, 2022

ArneBinder linked an issue Apr 3, 2022 that may be closed by this pull request

token classification with windowing should not contain predictions from pure context #127

Closed

ArneBinder added 2 commits April 3, 2022 18:08

fix order

daaafdc

fix check for overlap: end index is exclusive (non-pythonic); improve…

1fece7d

… test to contain a span that is only partly in valid window

ArneBinder added the bug Something isn't working label Apr 3, 2022

add note to test

06253a8

ChristophAlt merged commit 5128214 into main Apr 11, 2022

ChristophAlt deleted the fix/exclude_out_of_window_spans branch April 17, 2022 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exclude spans that are out of valid window #128

exclude spans that are out of valid window #128

ArneBinder commented Apr 3, 2022 •

edited

Loading

exclude spans that are out of valid window #128

exclude spans that are out of valid window #128

Conversation

ArneBinder commented Apr 3, 2022 • edited Loading

ArneBinder commented Apr 3, 2022 •

edited

Loading