[MRG] Fix batch issue when generating features + add sample_weight in deep models #220

YanisLalou · 2024-07-19T08:10:00Z

Before when doing a model.predict_features(X) we were passing all the input in a single batch.
This created CUDA out of memory issues when working with big datasets.
Thus here I tried to mimic the behaviour of skorch models to use batching.

Might not be the best way to fix that issue though, would love your opinion @tgnassou

tgnassou

I'm wondering if you can not just create a dataloader on your own, without using skorch function. Like that you only create a dataloader with X, iterate on the dataloader and that's it. It seems very long to do something simple, but maybe I'm wrong

skada/deep/base.py

antoinecollas

sample_weight is everywhere but in practice we only use it to reweight the loss, right? IMO, it should used only when calling the loss but maybe I misunderstood something.

If you run coverage run -m pytest -v -s && coverage html && open htmlcov/index.html on your branch and on the main branch, you will see that you have added 24 lines in skada/deep/base.py that are not covered by tests.

skada/deep/base.py

antoinecollas · 2024-08-11T15:44:11Z

From Skorch FAQ: when X is a dict, its keys are passed as kwargs to forward, thus our forward has to have the arguments 'data' and 'sample_weight'; usually, sample_weight can be ignored here.

…it method when X is a Dataset

codecov · 2024-08-12T14:07:15Z

Codecov Report

Attention: Patch coverage is 74.26471% with 35 lines in your changes missing coverage. Please review.

Project coverage is 96.31%. Comparing base (28aaf82) to head (6aeda7e).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #220      +/-   ##
==========================================
- Coverage   97.01%   96.31%   -0.70%     
==========================================
  Files          54       54              
  Lines        5429     5486      +57     
==========================================
+ Hits         5267     5284      +17     
- Misses        162      202      +40

…o fix_batch_feature_deep

tgnassou · 2024-08-16T14:32:17Z

Very good PR, the only thing missing is to modify also CDANModule

Fix batch issue when generating features in deep models

0224017

YanisLalou changed the title ~~Fix batch issue when generating features in deep models~~ [WIP] Fix batch issue when generating features in deep models Jul 19, 2024

tgnassou reviewed Jul 19, 2024

View reviewed changes

skada/deep/base.py Outdated Show resolved Hide resolved

skada/deep/base.py Show resolved Hide resolved

tgnassou and others added 9 commits July 19, 2024 15:46

Merge branch 'main' into fix_batch_feature_deep

b98e95c

remove unecessary param in a call of scorer

707c4ca

Refactor the DomainAwareNet code + Add the optionnal "sample_weight" arg

0f97d2b

Fix small arg mistakes

ad2d058

rearrange y_pred - feature order

30aa7ed

rm unused regex import

39448c9

bring back original ImportanceWeightedScorer

4300e59

Add sample_weight arg to test deep modules

3d3f65b

Fix domainCriterion init arg

1db3dd1

antoinecollas changed the title ~~[WIP] Fix batch issue when generating features in deep models~~ [WIP] Fix batch issue when generating features + add sample_weight in deep models Aug 11, 2024

antoinecollas reviewed Aug 11, 2024

View reviewed changes

antoinecollas added 5 commits August 12, 2024 09:26

bring back initial __init__

3f7847c

bring back test

9679856

comment check_X_domain and check_X_y_domain in DomainAwareNet + fix f…

d05cad1

…it method when X is a Dataset

fix predict_features

2c07dc1

fix default reduce in DomainAwareCriterion

6aeda7e

antoinecollas added 9 commits August 12, 2024 16:16

rm un-necessary comments

f914653

add sample_weight test

d666982

fix sample_weight

ef642ac

check error is raised when mismatch between criterion and sample_weight

d04ad39

torch convention and reduction on source for now

cc5ea8b

blank lines

12d8f15

simplify _process_dataset and add tests

de5c982

simplify _process_dataset

c45be09

rm un-used if case

e44dcae

antoinecollas added 4 commits August 12, 2024 22:07

minor change

c5d49cc

match skorch convention

9e3ac76

add test of feature_infer

4a4c86d

usse skorch convention

fd84a2e

antoinecollas changed the title ~~[WIP] Fix batch issue when generating features + add sample_weight in deep models~~ [TO_REVIEW] Fix batch issue when generating features + add sample_weight in deep models Aug 13, 2024

antoinecollas and others added 3 commits August 14, 2024 10:42

re-run tests

7b80ccf

Merge branch 'main' of https://github.com/scikit-adaptation/skada int…

060614f

…o fix_batch_feature_deep

Merge branch 'main' into fix_batch_feature_deep

55dcd32

tgnassou changed the title ~~[TO_REVIEW] Fix batch issue when generating features + add sample_weight in deep models~~ [MRG] Fix batch issue when generating features + add sample_weight in deep models Sep 3, 2024

tgnassou merged commit dccc59e into scikit-adaptation:main Sep 3, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Fix batch issue when generating features + add sample_weight in deep models #220

[MRG] Fix batch issue when generating features + add sample_weight in deep models #220

YanisLalou commented Jul 19, 2024

tgnassou left a comment

antoinecollas left a comment •

edited

Loading

antoinecollas commented Aug 11, 2024

codecov bot commented Aug 12, 2024

tgnassou commented Aug 16, 2024

[MRG] Fix batch issue when generating features + add sample_weight in deep models #220

[MRG] Fix batch issue when generating features + add sample_weight in deep models #220

Conversation

YanisLalou commented Jul 19, 2024

tgnassou left a comment

Choose a reason for hiding this comment

antoinecollas left a comment • edited Loading

Choose a reason for hiding this comment

antoinecollas commented Aug 11, 2024

codecov bot commented Aug 12, 2024

Codecov Report

tgnassou commented Aug 16, 2024

antoinecollas left a comment •

edited

Loading