Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weights normalization in ReweightDensity #112

Closed
antoinedemathelin opened this issue Feb 27, 2024 · 8 comments
Closed

Weights normalization in ReweightDensity #112

antoinedemathelin opened this issue Feb 27, 2024 · 8 comments

Comments

@antoinedemathelin
Copy link
Contributor

Hi everyone,
I wonder why the weight are normalized in ReweightDensity cf

source_weights /= source_weights.sum()
?

After some tests, I observed that the strange behavior of ReweightDensity in the comparison example is due to the weight normalization. cf. https://scikit-adaptation.github.io/auto_examples/plot_method_comparison.html#sphx-glr-auto-examples-plot-method-comparison-py

Maybe, we should remove the normalization or at least divide by the mean instead of the sum (dividing by the sum provokes a dependance of the weights with respect to the dataset size, which should be avoided, I think).

@BuenoRuben
Copy link
Contributor

You are right, when comparing the weights obtained with this methods, with those from the others, it is the only one that is normalized in such a way (all other reweighting methods have some weights bigger than 1), so it might be better not to normalize it

@kachayev
Copy link
Collaborator

From the code, it seems like normalization here is specifically to produce density. Right? DiscriminatorReweightDensityAdapter also returns density (normalized), it just does so by calling predict_proba on the classifier.

@BuenoRuben
Copy link
Contributor

maybe it is a mistake, and we shouldn't give a density in both cases:
those are the two worst performing reweighting method, being worst on the example than when we are not using da
(see: https://output.circle-artifacts.com/output/job/8955b99a-a58c-4b5c-afb0-8bcb4c545245/artifacts/0/dev/auto_examples/plot_reweighting.html#sphx-glr-auto-examples-plot-reweighting-py )
also I think weights are not supposed to be density, those are just supposed to be the quotient of two probabilities, and thus can be highter than 1, and have a sum highter than 1

@antoinedemathelin
Copy link
Contributor Author

antoinedemathelin commented Feb 28, 2024

Hi,
In fact the optimal weights should be equal to the density ratio w(x) = pt(x) / ps(x).
So, when the number of source data is large, the weights should approximately average to one (and not sum to one).

Moreover, from a scikit-learn implementation perspective, uniform weighting is equivalent to all weight being equal to one, I think (i.e. the mean of the weights should be equal to one). Otherwise, I think you may fall in numerical issue, when the number of source data is large (your weights become close to zeros).

From my point of view, there is also an issue with the current implementation of DiscriminatorReweightDensityAdapter, I think the weights should be computed with the formula: (1-predicted_proba) / predicted_proba instead of predicted_proba. Because the optimal discriminator verifies: predicted_proba = ps / (ps + pt) = (1/w + 1)^-1 cf GAN paper

@rflamary
Copy link
Collaborator

I agree with you that this reweighting with just probabilities is not correct or the one from the papers (I would cite the review of sugiyama instead of GAn though ;)). OK for weights that sum to n instead of 1 because indeed it changes the proper parameters (C or alpha) in the sklearn models. I juts had the problem when implementing the JDOT classifier.

@rflamary
Copy link
Collaborator

Also about the Reweight adapter we should maybe rename them without density everywhere; density is used in the model that use a density estimator, but it would make sens to remove teh desnsity for the Gaussian one (it is implied in a way) and the discriminator one. We did not use density for KMM either. As log as Reweight is in the same it is enough I think.

@antoinedemathelin
Copy link
Contributor Author

Ok thank you @rflamary,
I will open a PR to correct these few issues

@rflamary
Copy link
Collaborator

rflamary commented Mar 1, 2024

closing this usse fixed in #118

@rflamary rflamary closed this as completed Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants