-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AnomalyDetection] Add threshold and aggregation functions. #34018
base: master
Are you sure you want to change the base?
Conversation
Also include the following changes: * change prediction to predictions (iterable) in AnomalyResult. * fix some tests that contaminate _KNONW_SPECIFIABLE
r: @damccorm |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
include_history (bool): If True, include the input predictions in the | ||
`agg_history` of the output. Defaults to False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is history a standard term here or something we came up with? It seems a little odd to me if its not already a standard, to me it implies including historical predictions (e.g. what did I predict for the last data point).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use that to store pre-aggregated predictions so that users can always go back to check what leads to the aggregated result. It is kind of a history to me, but I agree that it may be misleading in some other context. If you have a better term, I am more than happy to hear that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just include_input_predictions
? And input_predictions
instead of agg_history
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about include_source_predictions
and source_predictions
?
""" | ||
scores = [ | ||
prediction.score for prediction in predictions | ||
if prediction.score is not None and not math.isnan(prediction.score) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might make sense to define a score for predictions which have no score, but have a label (and vice versa). Or maybe we can just throw? I guess this relates to my above question - when would we expect this scoreless condition to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for raising this important point.
Let me take a step back and clarify the workflow we implemented here:
[input] -> detector -> [score] -> threhold_fn -> [label] -> aggregation_fn -> [aggregated label]
-
First, we are trying to address scenarios where a detector generates score of
None
andNaN
. In my opinion, we can distinguish between these two cases:- The detector is NOT ready to give a prediction. This could imply that the detector needs some warm-up time before the first inference can be made.
- The detector is ready to predict, but there is something wrong during the prediction process. For example, the input data could be ill-formated or the detector is simply not able to make a prediction on this input.
We can use
None
to represent the first case, andNaN
for the second one. The rationale is thatNone
value is something we don't know yet, but recoverable (if we feed the input into the detector that is ready to score), butNaN
is coming from an error during prediction and can never be recovered. -
After we have
None
andNaN
scores, the threshold_fn needs to handle how to assign labels for them.- In the current implementation, I only consider
None
and assign a normal label to it, which may be ok, because we don't want to flag outliers when the detector is still warming up. Alternatively, we can also set the label to beNone
which means that we will defer the decision to other detectors. - For the irrecoverable
NaN
score, I think we can assign an outlier label.
- In the current implementation, I only consider
-
When multiple labels are ready for aggregation, it is reasonable to apply the aggregation_fn on all the non-None labels. If they are all
None
(the situation you mentioned in the previous comment), maybe we can expose another parameter in the aggregation function for undecided default.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I like this approach and think that using NaN/None for weird failure/intentional no output is reasonable.
We can use None to represent the first case, and NaN for the second one. The rationale is that None value is something we don't know yet, but recoverable (if we feed the input into the detector that is ready to score), but NaN is coming from an error during prediction and can never be recovered.
I'd actually flip these. I think None
is more likely to happen because of some user mistake (e.g. I'm using a predictor that outputs a label in a situation that expects a score or vice versa), whereas NaN is an intentional choice.
When multiple labels are ready for aggregation, it is reasonable to apply the aggregation_fn on all the non-None labels. If they are all None (the situation you mentioned in the previous comment), maybe we can expose another parameter in the aggregation function for undecided default.
I think if all the detectors agree (whether that is a None or NaN, it makes sense to match whatever they are outputting). If they're all inconclusive, an inconclusive result makes sense. If they're all errors, an error result makes sense. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually flip these. I think None is more likely to happen because of some user mistake (e.g. I'm using a predictor that outputs a label in a situation that expects a score or vice versa), whereas NaN is an intentional choice.
I am fine with that.
I think if all the detectors agree (whether that is a None or NaN, it makes sense to match whatever they are outputting). If they're all inconclusive, an inconclusive result makes sense. If they're all errors, an error result makes sense. Thoughts?
Hmmm...Are you saying that other than normal and outlier label, we will have to add two labels for the cases of score=None and score=NaN, respectively? This will be get a bit complicated; for example, some model says None while the other says NaN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second thought, maybe it is not as complicated as it seems to me. Let me see if it can be sorted out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I made the changes. PTAL.
Also includes the following adjustments per reviewer's feedback - Rename include_history to include_source_predictions. - Get rid of the unnecessary cast
ce31fc1
to
141d8c2
Compare
141d8c2
to
3170177
Compare
Also include the following changes:
This is a follow-up PR of #33845 and #33994.