-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windowing for token classification #72
Conversation
It looks good. However, I'm still not convinced that adding this windowing and partitioning to the taskmodule is a good idea. It considerably increases code complexity and all the offset magic makes me wonder how many edge cases there still are. |
The more I think about it the more I'm convinced that it's a bad idea to integrate windowing into the same taskmodule that provides the "base" functionality. It just shouldn't be there, because it operates on a completely different level of abstraction and it also requires us to replicate the same functionality over and over across current and future taskmodules. Also the logic for windowing is quite complicated, which increases the probability of introducing a bug into the base functionality. Why don't we have a "windowing wrapper" that wraps a taskmodule and extends it with windowing functionality and different windowing strategies but relies on the general logic of the wrapped taskmodule? This allows us to separate windowing from encoding / decoding logic, makes the code much cleaner and reusable for other taskmodules of the same task. If this is not possible the current abstraction should be changed to support it. -- Well keep it like this for now and come back to this at a later date. |
…quence() and _encode_text() to ease testing
59983b2
to
09fd9e8
Compare
This is fully implemented, but requires #71 to be merged first (this PR is against
main
, so we hopefully won't have any issues with that).EDIT: This is fully functional (it is currently training at the SciArg corpus).