Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TO_REVIEW] Switch allow_source to True by default #64

Merged
merged 6 commits into from
Feb 5, 2024

Conversation

YanisLalou
Copy link
Collaborator

I had ValueErrors raised when running this code:

pipeline = make_da_pipeline(
    PCA(n_components=2),
    GaussianReweightDensityAdapter(),
    LogisticRegression()
)

X, y, sample_domain = domain_dataset.pack_lodo()

cv = LeaveOneDomainOut(max_n_splits=len(domain_dataset.domain_names_), test_size=0.3, random_state=0)
scores = cross_validate(
    pipeline,
    X,
    y,
    cv=cv,
    params={'sample_domain': sample_domain},
)

I was because the allow_source during the Check_X_source was set to False by default which shouldn't be the case.

@YanisLalou YanisLalou changed the title Switch allow_source to True by default [TO_REVIEW] Switch allow_source to True by default Jan 30, 2024
@kachayev
Copy link
Collaborator

We talked about this during the call: predict on source might be an ill-defined operation. Even when it is well-defined, performing predict on the source should not be a 'common' thing to do. Certainly requires an explicit intent to do so.

@YanisLalou
Copy link
Collaborator Author

Oh yes right, well in that case we should set to False as the default value of allow_source in every function, right ?.

Also at which point in the pipeline can we set the allow_source to True if the user want it?
I tried to use the arg params={'sample_domain': sample_domain} at the initialisation of the estimator / the scorer, and in the cross_validate(), and each time I get either a TypeError: __init__() got an unexpected keyword argument 'params'
either a TypeError: adapt() got an unexpected keyword argument 'allow_source'.

@kachayev
Copy link
Collaborator

For the pipeline you just pass it into the function that you call, like predict for example. The routing mechanism is there to make sure that the parameter goes where it's due.

@kachayev
Copy link
Collaborator

I'm not sure that has to be False for all cases, I need to refresh my memory of what those cases are. It seems to me we had a form of 'fit_transform' that was designed for adapters fitting but I'm not sure this functionality survived the latest re-write of the API.

@tgnassou
Copy link
Collaborator

The allow_source parameter that Yanis talks about is the one for transform function. So we need the source domain here. I don't understand why we could potentially not have access to source domain.

@kachayev
Copy link
Collaborator

kachayev commented Feb 1, 2024

So the logic in the original code was the following:

  • transform by default does not allow the source (allow_source=False)
  • fit_transform is implemented in the base class to avoid default .fit().transform() pass-through, and this one does explicitly state allow_source=True

The idea is that fit_transform is happening during fitting, when transform itself is only called after that. During the fitting we use sources (for sure), after that, by default, we should not. Or we should be explicit about trying to do so. Let me know if that makes sense @tgnassou

Copy link

codecov bot commented Feb 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (268c018) 86.24% compared to head (1e1cb4e) 86.12%.
Report is 1 commits behind head on main.

❗ Current head 1e1cb4e differs from pull request most recent head dc4c41a. Consider uploading reports for the commit dc4c41a to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #64      +/-   ##
==========================================
- Coverage   86.24%   86.12%   -0.13%     
==========================================
  Files          37       37              
  Lines        2334     2328       -6     
==========================================
- Hits         2013     2005       -8     
- Misses        321      323       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kachayev
Copy link
Collaborator

kachayev commented Feb 5, 2024

This one is ready to be merged, right @YanisLalou?

@YanisLalou
Copy link
Collaborator Author

This one is ready to be merged, right @YanisLalou?

I just reformulated something to make it clearer. Now it is ready to be merged

@kachayev kachayev merged commit 1f51b04 into scikit-adaptation:main Feb 5, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants