-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Regression label for 2d classification data generation #69
Conversation
the main bug that has been changes was due to the y.astype(int) in the return that rounded the float values when using regression labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the code I gave you, there was also a regression mode for subspace dataset, it could be nice to have it too
Your tests are not passing |
the test as using a method that has been changed, thus raising errors
the two tests that got removed were checking that the y-values were between 0 and 1, which should not necessarely be the case in regression
I just changed the size of the colorbar so that we have better looking plots next to it
now _generate_data_2d_classif_subspace use the label that has been given in parametter instead of "binary" everytimes, additionally the example for the regression label use the subspace shift
the main issue was that the y vlues that were generated weren't of the correct size (note the same as the X values)
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #69 +/- ##
==========================================
+ Coverage 86.82% 87.16% +0.34%
==========================================
Files 38 38
Lines 2398 2439 +41
==========================================
+ Hits 2082 2126 +44
+ Misses 316 313 -3 ☔ View full report in Codecov by Sentry. |
this test should cover the change over generate_data_2d_classif_subspace when using 'multiclass' or 'regression' label
with subset shift, the values are twice smaller for the default case
this was needed with teh previous changes
elif label == 'multiclass': | ||
y = np.zeros(n1) | ||
for i in range(4): | ||
y = np.concatenate((y, (i + 1) * np.ones(n2)), 0) | ||
y = y.astype(int) | ||
elif label == 'regression': | ||
# create label y with gaussian distribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it could be nice to have the possibility to modify the mu
and the Sigma1
as we want. So just put it in the parameters of the function with default values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And change the name to mu_regression, sigma_regression
for i in range(k): | ||
y = np.concatenate((y, (i + 1) * np.ones(n1//k)), 0) | ||
y = y.astype(int) | ||
elif label == 'regression': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
In this branch, I added the 'regression' label to both
_generate_data_2d_classif
and_generate_data_2d_classif_subspace
in skada/datasets/_sample_generator.py, added a new example for this label: examples/datasets/plot_shifted_dataset_regression.pyand added a new test:
test_make_shifted_datasets_regression
in skada\datasets\tests\test_samples_generator.py