[MRG] Make sure all API methods accept sample_domain as None #53

YanisLalou · 2024-01-12T09:55:27Z

Issue #17

Change default value for allow_auto_sample_domain in check_X_y_domain
check if the masking of y is well done in check_X_y_domain

…ing of y is well done in the check_X_y_domain function

codecov · 2024-01-12T09:59:15Z

Codecov Report

Attention: 15 lines in your changes are missing coverage. Please review.

Comparison is base (895f862) 84.43% compared to head (8f3d286) 86.16%.

❗ Current head 8f3d286 differs from pull request most recent head 2f428a8. Consider uploading reports for the commit 2f428a8 to get more accurate results

Files	Patch %	Lines
skada/utils.py	83.07%	11 Missing ⚠️
skada/metrics.py	75.00%	3 Missing ⚠️
skada/_utils.py	94.73%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #53      +/-   ##
==========================================
+ Coverage   84.43%   86.16%   +1.72%     
==========================================
  Files          35       37       +2     
  Lines        2191     2334     +143     
==========================================
+ Hits         1850     2011     +161     
+ Misses        341      323      -18

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

kachayev · 2024-01-12T11:51:19Z

skada/tests/test_utils.py

+
+def test_check_y_masking_regression():
+    y_properly_masked = np.array([np.nan, 1, 2.5, -1, np.nan, 0, -1.5])
+    y_wrongfuly_masked = np.array([-1, -2, 2.5, -1, 2, 0, 1])


I guess the type of this array is 'float', right? In this case we should assume it's a regression task with no labels being masked 🤔

I don't get it, isn't it possible to have masked arrays for regression tasks?

It's possible to have masks for regression using nans. The variable that's called y_wrongfuly_masked doesn't have a 'wrongly' masked array, it does have 'non masked' array. That's why I was confused about the name.

kachayev · 2024-01-12T13:45:40Z

skada/_utils.py

@@ -59,11 +59,18 @@ def check_X_y_domain(
    X = check_array(X, input_name='X', allow_nd=allow_nd)
    y = check_array(y, force_all_finite=True, ensure_2d=False, input_name='y')
    check_consistent_length(X, y)
-    if sample_domain is None and allow_auto_sample_domain:
+    if sample_domain is None and not allow_auto_sample_domain:


We need to make sure that all of those clauses are properly covered with unit tests. I guess just updating check_X_y_domain_exceptions with a bunch of correct and incorrect inputs would do.

kachayev · 2024-01-12T13:49:00Z

skada/_utils.py

        sample_domain = np.ones_like(y)
        # labels masked with -1 are recognized as targets,
        # the rest is treated as a source
-        sample_domain[y == -1] = -2
+        if y_type == 'classification':
+            sample_domain[y == -1] = -2


I always try to avoid using 'magic' numbers in the code, as they can be extremely difficult to understand and modify later on. Let's create a constant named _DEFAULT_TARGET_DOMAIN_LABEL = -2 (or something like this) in the current module namespace.

kachayev · 2024-01-12T13:49:37Z

skada/_utils.py

+        if y_type == 'classification':
+            sample_domain[y == -1] = -2
+        else:
+            sample_domain[np.isnan(y)] = -2


Same constant here.

kachayev · 2024-01-12T13:50:29Z

skada/_utils.py

@@ -114,13 +121,17 @@ def check_X_domain(
    return_indices: bool = False,
    # xxx(okachaiev): most likely this needs to be removed as it doesn't fit new API
    return_joint: bool = True,
-    allow_auto_sample_domain: bool = False,
+    allow_auto_sample_domain: bool = True,


Let's put a docstring for this function.

skada/_utils.py

kachayev · 2024-01-12T14:35:12Z

@YanisLalou I can't seem to push the update from my local branch here (not sure why, github previously allowed me to do so). Would you please check flake8 locally and fix all of its complaints?

skada/_utils.py

rflamary

A few comments maybe we should discuss that in the chat

rflamary · 2024-01-17T09:20:32Z

skada/_utils.py

@@ -40,30 +51,87 @@ def _estimate_covariance(X, shrinkage):
 def check_X_y_domain(
    X,
    y,
-    sample_domain,
+    sample_domain=None,
    allow_source: bool = True,
    allow_multi_source: bool = True,
    allow_target: bool = True,
    allow_multi_target: bool = True,
    return_indices: bool = False,
    # xxx(okachaiev): most likely this needs to be removed as it doesn't fit new API


if we shoudl remove something aybe do it while we are working on ythis function?

I bet there's a separate task for this)

skada/_utils.py

YanisLalou · 2024-01-18T12:53:12Z

⬆️ Issue #58

…it_source_target_X()

skada/tests/test_utils.py

skada/utils.py

…rays + output the data the same way as sklearn

skada/tests/test_utils.py

skada/utils.py

Change default value for allow_auto_sample_domain + check if the mask…

67841e9

…ing of y is well done in the check_X_y_domain function

YanisLalou added 2 commits January 12, 2024 11:58

Fix + Test _check_y_masking function

d178f6c

Add raise Exception to check_X_domain + Test

f1c1c5d

YanisLalou requested a review from kachayev January 12, 2024 13:38

YanisLalou self-assigned this Jan 12, 2024

YanisLalou added the domain-aware api label Jan 12, 2024

kachayev changed the title ~~[WIP] Make sure all API methods accept sample_domain as None~~ Make sure all API methods accept sample_domain as None Jan 12, 2024

kachayev reviewed Jan 12, 2024

View reviewed changes

Merge branch 'main' into Issue_17_branch

07b8846

YanisLalou and others added 4 commits January 12, 2024 15:50

Flake8 warnings

728a706

Merge branch 'main' into Issue_17_branch

c9a8a94

Docstring + Tests for exceptions handling

1d88ccf

varible name change

9e8042c

kachayev reviewed Jan 16, 2024

View reviewed changes

skada/_utils.py Outdated Show resolved Hide resolved

kachayev reviewed Jan 16, 2024

View reviewed changes

skada/_utils.py Outdated Show resolved Hide resolved

Add Global variables

f8ec8e9

rflamary reviewed Jan 17, 2024

View reviewed changes

Split check_X_Y_domain and check_X_domain to more functions

b124290

YanisLalou added 2 commits January 18, 2024 14:04

fix typo error

68de2e6

Add test for extract_source_indices(), split_source_target_X_y(), spl…

f633160

…it_source_target_X()

YanisLalou requested a review from rflamary January 18, 2024 13:49

Merge branch 'main' into Issue_17_branch

c4f5627

kachayev requested changes Jan 19, 2024

View reviewed changes

skada/tests/test_utils.py Outdated Show resolved Hide resolved

skada/utils.py Outdated Show resolved Hide resolved

YanisLalou and others added 4 commits January 24, 2024 15:34

Rename test name

1d0b0ec

Merge branch 'main' into Issue_17_branch

85ec64c

Changement on the source_target_split function --> Can accept now *ar…

06f2390

…rays + output the data the same way as sklearn

Fix plot_shifted_dataset.py

8d22250

kachayev and others added 3 commits January 25, 2024 14:57

Merge branch 'main' into Issue_17_branch

900d670

fix plot_dataset_from_moons_distribution.py

4d3e192

remove unwanted changes to metrics.py

8f3d286

YanisLalou changed the title ~~Make sure all API methods accept sample_domain as None~~ [TO_REVIEW] Make sure all API methods accept sample_domain as None Jan 25, 2024

YanisLalou requested a review from kachayev January 25, 2024 14:22

kachayev requested changes Jan 25, 2024

View reviewed changes

remove old comments + flake8

2f428a8

kachayev changed the title ~~[TO_REVIEW] Make sure all API methods accept sample_domain as None~~ [MRG] Make sure all API methods accept sample_domain as None Jan 26, 2024

kachayev approved these changes Jan 26, 2024

View reviewed changes

kachayev merged commit d4eb894 into scikit-adaptation:main Jan 26, 2024
4 checks passed

kachayev mentioned this pull request Jan 26, 2024

Split helper functions for input validation and reshaping #58

Closed

YanisLalou deleted the Issue_17_branch branch January 26, 2024 13:41

YanisLalou mentioned this pull request Jan 29, 2024

Cleanup check_*_domain utils #4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Make sure all API methods accept sample_domain as None #53

[MRG] Make sure all API methods accept sample_domain as None #53

YanisLalou commented Jan 12, 2024

codecov bot commented Jan 12, 2024 •

edited

Loading

kachayev Jan 12, 2024

YanisLalou Jan 15, 2024

kachayev Jan 15, 2024

kachayev Jan 12, 2024

kachayev Jan 12, 2024

kachayev Jan 12, 2024

kachayev Jan 12, 2024

kachayev commented Jan 12, 2024

rflamary left a comment

rflamary Jan 17, 2024

kachayev Jan 17, 2024

YanisLalou commented Jan 18, 2024

[MRG] Make sure all API methods accept sample_domain as None #53

[MRG] Make sure all API methods accept sample_domain as None #53

Conversation

YanisLalou commented Jan 12, 2024

codecov bot commented Jan 12, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kachayev commented Jan 12, 2024

rflamary left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YanisLalou commented Jan 18, 2024

codecov bot commented Jan 12, 2024 •

edited

Loading