-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] PredictionEntropyScorer output negative scores #63
[MRG] PredictionEntropyScorer output negative scores #63
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #63 +/- ##
==========================================
+ Coverage 88.83% 88.95% +0.12%
==========================================
Files 41 41
Lines 2678 2708 +30
==========================================
+ Hits 2379 2409 +30
Misses 299 299 ☔ View full report in Codecov by Sentry. |
Actually there was another mistake. Now we compute directly the mean of the entropy of each sample. |
I reread the paper. For entropy scorer, we want to minimize the entropy, so |
Maybe we can change the name |
Yes my bad you're right |
I think the best thing is to put a sum instead of the mean and implement what is done in the paper. I think the mean was a mistake. We can plot it like a loss. |
Not a fan of removing the mean. If we do that the output will be proportionate to the number of samples. Thus it will be hard to compare methods effectiveness between different datasets. |
Yeah, maybe a parameter |
good idea to handle both (state in teh doc that one of them corespod to the paper) |
That's an interesting question... Usually, I'm a 'theory absolutist' when it comes to the questions like this one. If we call it 'entropy', it should perform 'sum'. Otherwise it's no longer entropy. In PyTorch, 'reduction' parameter for losses is typically (as it should be) a 'batch-related' setting: it only tells the system what to do with the fact the the loss is typically per-item thing, which means that for a batch we have an array of those, while the gradient descent requires a scalar. Thus it provides 'sum' or 'mean' options for how to collapse the array of values into a scalar. This doesn't change the nature of the loss itself, just a mini-batch GD. Also, from the gradient perspective it's just a matter for multiplicative scalar. You are right that with larger number of samples max entropy of the system increases. As it should... The hypothesis that 'entropy divided by the number of samples' makes better comparison between different methods is interesting though by no means obvious or intuitive. |
Also,
Where is this stated for sklearn scores to be from a particular range? I'm not sure I've seen this requirement. |
It's not a requirement, but to the best of my recollection, I believe that every class ending with the term 'score' - 'scorer' is upper-bounded by 1. Otherwise, they are referred to as 'loss,' 'error,' and so on. |
Issue:
$$[ E(X_T) = - \sum_{x_t \in T} \langle f(x_t; \theta), \log f(x_t; \theta) \rangle ]$$
PredictionEntropyScorer
output negative scoresSolution: In
PredictionEntropyScorer
we compute equation (page 3 paper: https://arxiv.org/pdf/1711.10288.pdf):In the code we forgot to use the greater_is_better flag + we use a double minus sign in the formula.
+ Add test case to test that all scores computed are > 0