Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Regression label for 2d classification data generation #69

Merged
merged 20 commits into from
Feb 7, 2024

Conversation

BuenoRuben
Copy link
Contributor

@BuenoRuben BuenoRuben commented Feb 1, 2024

In this branch, I added the 'regression' label to both _generate_data_2d_classif and _generate_data_2d_classif_subspace in skada/datasets/_sample_generator.py, added a new example for this label: examples/datasets/plot_shifted_dataset_regression.py
and added a new test: test_make_shifted_datasets_regression in skada\datasets\tests\test_samples_generator.py

the main bug that has been changes was due to the y.astype(int) in the return that rounded the float values when using regression labels
@BuenoRuben BuenoRuben changed the title Regression label for 2d classification data generation [WIP] Regression label for 2d classification data generation Feb 1, 2024
@BuenoRuben BuenoRuben changed the title [WIP] Regression label for 2d classification data generation [MRG] Regression label for 2d classification data generation Feb 1, 2024
Copy link
Collaborator

@tgnassou tgnassou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the code I gave you, there was also a regression mode for subspace dataset, it could be nice to have it too

@tgnassou
Copy link
Collaborator

tgnassou commented Feb 1, 2024

Your tests are not passing

@BuenoRuben BuenoRuben changed the title [MRG] Regression label for 2d classification data generation [WIP] Regression label for 2d classification data generation Feb 1, 2024
the test as using a method that has been changed, thus raising errors
the two tests that got removed were checking that the y-values were between 0 and 1, which should not necessarely be the case in regression
I just changed the size of the colorbar so that we have better looking plots next to it
now _generate_data_2d_classif_subspace use the label that has been given in parametter instead of "binary" everytimes, additionally the example for the regression label use the subspace shift
the main issue was that the y vlues that were generated weren't of the correct size (note the same as the X values)
@BuenoRuben BuenoRuben changed the title [WIP] Regression label for 2d classification data generation [MRG] Regression label for 2d classification data generation Feb 2, 2024
Copy link

codecov bot commented Feb 2, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (5fe1df9) 86.82% compared to head (37aa429) 87.16%.
Report is 1 commits behind head on main.

Files Patch % Lines
skada/datasets/_samples_generator.py 95.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #69      +/-   ##
==========================================
+ Coverage   86.82%   87.16%   +0.34%     
==========================================
  Files          38       38              
  Lines        2398     2439      +41     
==========================================
+ Hits         2082     2126      +44     
+ Misses        316      313       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

BuenoRuben and others added 10 commits February 2, 2024 15:19
this test should cover the change over generate_data_2d_classif_subspace when using 'multiclass' or 'regression' label
with subset shift, the values are twice smaller for the default case
this was needed with teh previous changes
elif label == 'multiclass':
y = np.zeros(n1)
for i in range(4):
y = np.concatenate((y, (i + 1) * np.ones(n2)), 0)
y = y.astype(int)
elif label == 'regression':
# create label y with gaussian distribution
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it could be nice to have the possibility to modify the mu and the Sigma1 as we want. So just put it in the parameters of the function with default values.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And change the name to mu_regression, sigma_regression

for i in range(k):
y = np.concatenate((y, (i + 1) * np.ones(n1//k)), 0)
y = y.astype(int)
elif label == 'regression':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@tgnassou tgnassou merged commit 8e8fc7b into scikit-adaptation:main Feb 7, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants