Weights normalization in ReweightDensity #112

antoinedemathelin · 2024-02-27T14:51:41Z

Hi everyone,
I wonder why the weight are normalized in ReweightDensity cf

Line 108 in 9e7905d

source_weights /= source_weights.sum()

?

After some tests, I observed that the strange behavior of ReweightDensity in the comparison example is due to the weight normalization. cf. https://scikit-adaptation.github.io/auto_examples/plot_method_comparison.html#sphx-glr-auto-examples-plot-method-comparison-py

Maybe, we should remove the normalization or at least divide by the mean instead of the sum (dividing by the sum provokes a dependance of the weights with respect to the dataset size, which should be avoided, I think).

BuenoRuben · 2024-02-28T10:23:46Z

You are right, when comparing the weights obtained with this methods, with those from the others, it is the only one that is normalized in such a way (all other reweighting methods have some weights bigger than 1), so it might be better not to normalize it

kachayev · 2024-02-28T11:02:14Z

From the code, it seems like normalization here is specifically to produce density. Right? DiscriminatorReweightDensityAdapter also returns density (normalized), it just does so by calling predict_proba on the classifier.

BuenoRuben · 2024-02-28T12:12:54Z

maybe it is a mistake, and we shouldn't give a density in both cases:
those are the two worst performing reweighting method, being worst on the example than when we are not using da
(see: https://output.circle-artifacts.com/output/job/8955b99a-a58c-4b5c-afb0-8bcb4c545245/artifacts/0/dev/auto_examples/plot_reweighting.html#sphx-glr-auto-examples-plot-reweighting-py )
also I think weights are not supposed to be density, those are just supposed to be the quotient of two probabilities, and thus can be highter than 1, and have a sum highter than 1

antoinedemathelin · 2024-02-28T12:15:41Z

Hi,
In fact the optimal weights should be equal to the density ratio w(x) = pt(x) / ps(x).
So, when the number of source data is large, the weights should approximately average to one (and not sum to one).

Moreover, from a scikit-learn implementation perspective, uniform weighting is equivalent to all weight being equal to one, I think (i.e. the mean of the weights should be equal to one). Otherwise, I think you may fall in numerical issue, when the number of source data is large (your weights become close to zeros).

From my point of view, there is also an issue with the current implementation of DiscriminatorReweightDensityAdapter, I think the weights should be computed with the formula: (1-predicted_proba) / predicted_proba instead of predicted_proba. Because the optimal discriminator verifies: predicted_proba = ps / (ps + pt) = (1/w + 1)^-1 cf GAN paper

rflamary · 2024-02-29T07:45:03Z

I agree with you that this reweighting with just probabilities is not correct or the one from the papers (I would cite the review of sugiyama instead of GAn though ;)). OK for weights that sum to n instead of 1 because indeed it changes the proper parameters (C or alpha) in the sklearn models. I juts had the problem when implementing the JDOT classifier.

rflamary · 2024-02-29T07:47:22Z

Also about the Reweight adapter we should maybe rename them without density everywhere; density is used in the model that use a density estimator, but it would make sens to remove teh desnsity for the Gaussian one (it is implied in a way) and the discriminator one. We did not use density for KMM either. As log as Reweight is in the same it is enough I think.

antoinedemathelin · 2024-02-29T08:37:30Z

Ok thank you @rflamary,
I will open a PR to correct these few issues

rflamary · 2024-03-01T14:58:04Z

closing this usse fixed in #118

antoinedemathelin mentioned this issue Feb 29, 2024

Fix DiscriminatorReweightDensity and ReweightDensity #118

Merged

rflamary closed this as completed Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weights normalization in ReweightDensity #112

Weights normalization in ReweightDensity #112

antoinedemathelin commented Feb 27, 2024

BuenoRuben commented Feb 28, 2024

kachayev commented Feb 28, 2024

BuenoRuben commented Feb 28, 2024

antoinedemathelin commented Feb 28, 2024 •

edited

Loading

rflamary commented Feb 29, 2024

rflamary commented Feb 29, 2024

antoinedemathelin commented Feb 29, 2024

rflamary commented Mar 1, 2024

Weights normalization in ReweightDensity #112

Weights normalization in ReweightDensity #112

Comments

antoinedemathelin commented Feb 27, 2024

BuenoRuben commented Feb 28, 2024

kachayev commented Feb 28, 2024

BuenoRuben commented Feb 28, 2024

antoinedemathelin commented Feb 28, 2024 • edited Loading

rflamary commented Feb 29, 2024

rflamary commented Feb 29, 2024

antoinedemathelin commented Feb 29, 2024

rflamary commented Mar 1, 2024

antoinedemathelin commented Feb 28, 2024 •

edited

Loading