[AnomalyDetection] Add threshold and aggregation functions. #34018

shunping · 2025-02-19T04:09:07Z

Also include the following changes:

change prediction to predictions (iterable) in AnomalyResult per reviews.
fix some tests that contaminate _KNONW_SPECIFIABLE

This is a follow-up PR of #33845 and #33994.

Also include the following changes: * change prediction to predictions (iterable) in AnomalyResult. * fix some tests that contaminate _KNONW_SPECIFIABLE

shunping · 2025-02-19T13:22:01Z

r: @damccorm

github-actions · 2025-02-19T13:23:23Z

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

damccorm · 2025-02-19T18:56:55Z

sdks/python/apache_beam/ml/anomaly/aggregations.py

+    include_history (bool): If True, include the input predictions in the
+      `agg_history` of the output. Defaults to False.


Is history a standard term here or something we came up with? It seems a little odd to me if its not already a standard, to me it implies including historical predictions (e.g. what did I predict for the last data point).

I use that to store pre-aggregated predictions so that users can always go back to check what leads to the aggregated result. It is kind of a history to me, but I agree that it may be misleading in some other context. If you have a better term, I am more than happy to hear that.

Maybe just include_input_predictions? And input_predictions instead of agg_history?

How about include_source_predictions and source_predictions?

sdks/python/apache_beam/ml/anomaly/aggregations.py

damccorm · 2025-02-19T19:51:56Z

sdks/python/apache_beam/ml/anomaly/aggregations.py

+    """
+    scores = [
+        prediction.score for prediction in predictions
+        if prediction.score is not None and not math.isnan(prediction.score)


I think it might make sense to define a score for predictions which have no score, but have a label (and vice versa). Or maybe we can just throw? I guess this relates to my above question - when would we expect this scoreless condition to happen.

Thanks for raising this important point.

Let me take a step back and clarify the workflow we implemented here:
[input] -> detector -> [score] -> threhold_fn -> [label] -> aggregation_fn -> [aggregated label]

First, we are trying to address scenarios where a detector generates score of None and NaN. In my opinion, we can distinguish between these two cases:

The detector is NOT ready to give a prediction. This could imply that the detector needs some warm-up time before the first inference can be made.

The detector is ready to predict, but there is something wrong during the prediction process. For example, the input data could be ill-formated or the detector is simply not able to make a prediction on this input.

We can use None to represent the first case, and NaN for the second one. The rationale is that None value is something we don't know yet, but recoverable (if we feed the input into the detector that is ready to score), but NaN is coming from an error during prediction and can never be recovered.

After we have None and NaN scores, the threshold_fn needs to handle how to assign labels for them.

In the current implementation, I only consider None and assign a normal label to it, which may be ok, because we don't want to flag outliers when the detector is still warming up. Alternatively, we can also set the label to be None which means that we will defer the decision to other detectors.

For the irrecoverable NaN score, I think we can assign an outlier label.

When multiple labels are ready for aggregation, it is reasonable to apply the aggregation_fn on all the non-None labels. If they are all None (the situation you mentioned in the previous comment), maybe we can expose another parameter in the aggregation function for undecided default.

WDYT?

Generally, I like this approach and think that using NaN/None for weird failure/intentional no output is reasonable.

We can use None to represent the first case, and NaN for the second one. The rationale is that None value is something we don't know yet, but recoverable (if we feed the input into the detector that is ready to score), but NaN is coming from an error during prediction and can never be recovered.

I'd actually flip these. I think None is more likely to happen because of some user mistake (e.g. I'm using a predictor that outputs a label in a situation that expects a score or vice versa), whereas NaN is an intentional choice.

When multiple labels are ready for aggregation, it is reasonable to apply the aggregation_fn on all the non-None labels. If they are all None (the situation you mentioned in the previous comment), maybe we can expose another parameter in the aggregation function for undecided default.

I think if all the detectors agree (whether that is a None or NaN, it makes sense to match whatever they are outputting). If they're all inconclusive, an inconclusive result makes sense. If they're all errors, an error result makes sense. Thoughts?

I'd actually flip these. I think None is more likely to happen because of some user mistake (e.g. I'm using a predictor that outputs a label in a situation that expects a score or vice versa), whereas NaN is an intentional choice.

I am fine with that.

I think if all the detectors agree (whether that is a None or NaN, it makes sense to match whatever they are outputting). If they're all inconclusive, an inconclusive result makes sense. If they're all errors, an error result makes sense. Thoughts?

Hmmm...Are you saying that other than normal and outlier label, we will have to add two labels for the cases of score=None and score=NaN, respectively? This will be get a bit complicated; for example, some model says None while the other says NaN.

On a second thought, maybe it is not as complicated as it seems to me. Let me see if it can be sorted out.

Ok. I made the changes. PTAL.

sdks/python/apache_beam/ml/anomaly/thresholds.py

Also includes the following adjustments per reviewer's feedback - Rename include_history to include_source_predictions. - Get rid of the unnecessary cast

Add threshold and aggregation functions.

638a271

Also include the following changes: * change prediction to predictions (iterable) in AnomalyResult. * fix some tests that contaminate _KNONW_SPECIFIABLE

github-actions bot added the python label Feb 19, 2025

shunping added 2 commits February 18, 2025 23:54

Fix lints.

e384169

Get rid of _run_init. The init will be called just-in-time.

7ede19c

shunping marked this pull request as ready for review February 19, 2025 13:22

damccorm reviewed Feb 19, 2025

View reviewed changes

Add logic to handle missing and error labels/scores.

cd86a36

Also includes the following adjustments per reviewer's feedback - Rename include_history to include_source_predictions. - Get rid of the unnecessary cast

shunping force-pushed the anomaly-detection-3 branch from ce31fc1 to 141d8c2 Compare February 21, 2025 21:04

Fix lints.

3170177

shunping force-pushed the anomaly-detection-3 branch from 141d8c2 to 3170177 Compare February 21, 2025 21:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AnomalyDetection] Add threshold and aggregation functions. #34018

[AnomalyDetection] Add threshold and aggregation functions. #34018

shunping commented Feb 19, 2025

shunping commented Feb 19, 2025

github-actions bot commented Feb 19, 2025

damccorm Feb 19, 2025

shunping Feb 20, 2025

damccorm Feb 20, 2025

shunping Feb 21, 2025

damccorm Feb 19, 2025

shunping Feb 20, 2025 •

edited

Loading

damccorm Feb 20, 2025

shunping Feb 21, 2025

shunping Feb 21, 2025

shunping Feb 21, 2025

		include_history (bool): If True, include the input predictions in the
		`agg_history` of the output. Defaults to False.

[AnomalyDetection] Add threshold and aggregation functions. #34018

Are you sure you want to change the base?

[AnomalyDetection] Add threshold and aggregation functions. #34018

Conversation

shunping commented Feb 19, 2025

shunping commented Feb 19, 2025

github-actions bot commented Feb 19, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shunping Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shunping Feb 20, 2025 •

edited

Loading