Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windowing for token classification #72

Merged
merged 13 commits into from
Feb 23, 2022

Conversation

ArneBinder
Copy link
Owner

@ArneBinder ArneBinder commented Feb 20, 2022

This is fully implemented, but requires #71 to be merged first (this PR is against main, so we hopefully won't have any issues with that).

EDIT: This is fully functional (it is currently training at the SciArg corpus).

@ArneBinder ArneBinder changed the title [WIP] Windowing for token classification Windowing for token classification Feb 21, 2022
@ChristophAlt
Copy link
Collaborator

It looks good. However, I'm still not convinced that adding this windowing and partitioning to the taskmodule is a good idea. It considerably increases code complexity and all the offset magic makes me wonder how many edge cases there still are.

@ChristophAlt
Copy link
Collaborator

ChristophAlt commented Feb 23, 2022

The more I think about it the more I'm convinced that it's a bad idea to integrate windowing into the same taskmodule that provides the "base" functionality. It just shouldn't be there, because it operates on a completely different level of abstraction and it also requires us to replicate the same functionality over and over across current and future taskmodules. Also the logic for windowing is quite complicated, which increases the probability of introducing a bug into the base functionality.

Why don't we have a "windowing wrapper" that wraps a taskmodule and extends it with windowing functionality and different windowing strategies but relies on the general logic of the wrapped taskmodule? This allows us to separate windowing from encoding / decoding logic, makes the code much cleaner and reusable for other taskmodules of the same task. If this is not possible the current abstraction should be changed to support it. -- Well keep it like this for now and come back to this at a later date.

@ArneBinder ArneBinder force-pushed the windowing_for_token_classification branch from 59983b2 to 09fd9e8 Compare February 23, 2022 11:42
@ChristophAlt ChristophAlt merged commit a426568 into main Feb 23, 2022
@ArneBinder ArneBinder deleted the windowing_for_token_classification branch February 23, 2022 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants