Windowing for token classification #72

ArneBinder · 2022-02-20T04:00:00Z

This is fully implemented, but requires #71 to be merged first (this PR is against main, so we hopefully won't have any issues with that).

EDIT: This is fully functional (it is currently training at the SciArg corpus).

pytorch_ie/taskmodules/transformer_token_classification.py

ChristophAlt · 2022-02-23T08:40:01Z

It looks good. However, I'm still not convinced that adding this windowing and partitioning to the taskmodule is a good idea. It considerably increases code complexity and all the offset magic makes me wonder how many edge cases there still are.

pytorch_ie/taskmodules/transformer_token_classification.py

ChristophAlt · 2022-02-23T09:42:02Z

The more I think about it the more I'm convinced that it's a bad idea to integrate windowing into the same taskmodule that provides the "base" functionality. It just shouldn't be there, because it operates on a completely different level of abstraction and it also requires us to replicate the same functionality over and over across current and future taskmodules. Also the logic for windowing is quite complicated, which increases the probability of introducing a bug into the base functionality.

Why don't we have a "windowing wrapper" that wraps a taskmodule and extends it with windowing functionality and different windowing strategies but relies on the general logic of the wrapped taskmodule? This allows us to separate windowing from encoding / decoding logic, makes the code much cleaner and reusable for other taskmodules of the same task. If this is not possible the current abstraction should be changed to support it. -- Well keep it like this for now and come back to this at a later date.

…ying it inplace

…quence() and _encode_text() to ease testing

…s tests)

…ns mask

…ional

ArneBinder changed the title ~~[WIP] Windowing for token classification~~ Windowing for token classification Feb 21, 2022

ArneBinder requested a review from ChristophAlt February 21, 2022 23:58

ChristophAlt reviewed Feb 23, 2022

View reviewed changes

pytorch_ie/taskmodules/transformer_token_classification.py Outdated Show resolved Hide resolved

ChristophAlt reviewed Feb 23, 2022

View reviewed changes

pytorch_ie/taskmodules/transformer_token_classification.py Outdated Show resolved Hide resolved

ChristophAlt reviewed Feb 23, 2022

View reviewed changes

pytorch_ie/taskmodules/transformer_token_classification.py Outdated Show resolved Hide resolved

ArneBinder added 11 commits February 23, 2022 12:12

create separate (prepared_)taskmodule_with_partition instead of modif…

ea84348

…ying it inplace

simplify encode_input

a600c20

simplify encode_target; outsource _convert_span_annotations_to_tag_se…

d57389a

…quence() and _encode_text() to ease testing

implement windowing for TransformerTokenClassificationTaskModule (plu…

a8f8bf9

…s tests)

fix overlap check

73fb5c3

fix index check

a11b656

improve char_to_token_mapping

854d880

exclude unknown tokens from special tokens when creating special toke…

f2fd9b7

…ns mask

collect some statistics when creating targets

af1fefb

fix merge error

6f692f0

make isort happy

09fd9e8

ArneBinder force-pushed the windowing_for_token_classification branch from 59983b2 to 09fd9e8 Compare February 23, 2022 11:42

ArneBinder added 2 commits February 23, 2022 12:46

don't use leading underscores if not just for internal usage

60cbf9b

add parameter show_statistics: bool = False to collect stats just opt…

0f83a65

…ional

ChristophAlt merged commit a426568 into main Feb 23, 2022

ArneBinder deleted the windowing_for_token_classification branch February 23, 2022 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windowing for token classification #72

Windowing for token classification #72

ArneBinder commented Feb 20, 2022 •

edited

Loading

ChristophAlt commented Feb 23, 2022

ChristophAlt commented Feb 23, 2022 •

edited

Loading

Windowing for token classification #72

Windowing for token classification #72

Conversation

ArneBinder commented Feb 20, 2022 • edited Loading

ChristophAlt commented Feb 23, 2022

ChristophAlt commented Feb 23, 2022 • edited Loading

ArneBinder commented Feb 20, 2022 •

edited

Loading

ChristophAlt commented Feb 23, 2022 •

edited

Loading