-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Fix batch issue when generating features + add sample_weight in deep models #220
[MRG] Fix batch issue when generating features + add sample_weight in deep models #220
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if you can not just create a dataloader on your own, without using skorch function. Like that you only create a dataloader with X, iterate on the dataloader and that's it. It seems very long to do something simple, but maybe I'm wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample_weight
is everywhere but in practice we only use it to reweight the loss, right? IMO, it should used only when calling the loss but maybe I misunderstood something.
If you run coverage run -m pytest -v -s && coverage html && open htmlcov/index.html
on your branch and on the main branch, you will see that you have added 24 lines in skada/deep/base.py
that are not covered by tests.
From Skorch FAQ: when X is a dict, its keys are passed as kwargs to |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #220 +/- ##
==========================================
- Coverage 97.01% 96.31% -0.70%
==========================================
Files 54 54
Lines 5429 5486 +57
==========================================
+ Hits 5267 5284 +17
- Misses 162 202 +40 |
Very good PR, the only thing missing is to modify also |
Before when doing a model.predict_features(X) we were passing all the input in a single batch.
This created CUDA out of memory issues when working with big datasets.
Thus here I tried to mimic the behaviour of skorch models to use batching.
Might not be the best way to fix that issue though, would love your opinion @tgnassou