General publisher plugin error handling questions #964

iceycake · 2016-06-01T11:57:14Z

I am creating a task so that it collects some system metrics using the build-in plugins and publish the metrics to TSDB. Due to the extremely high load on the TSDB cluster, occasionally we could get timeout on the TSDB publisher.

Some questions:

How to configure the number of consecutive errors before the task goes disable? Based on the error, I believe right now the setting is 10.
Does retry should be part of the publisher plugin logic or snap task?
Does snap buffer metrics if there is error when trying to publisher?

lynxbat · 2016-06-01T15:44:29Z

I am going to let someone like @jcooklin or @tjmcs comment on #1.

On #2 I would vote now since the deadline time on a workflow may give up.

For #2 and #3 we have an RFC being written now for optionally buffering both processor and publisher calls for a configurable period of time with retry. This would prevent workflow data from being lost and allow for reentry into processor and publisher calls that timeout. This should solve your problem without having to make the publisher maintain more state.

We don't expect the new RFC to be very difficult to implement.

IRCody · 2016-06-01T17:33:21Z

The setting for failures is 10. Right now it's set in rest here. Pretty low hanging fruit if we wanted to make it configurable.

iceycake · 2016-06-01T18:30:16Z

@IRCody Looks like it's a very easy fix to make it configurable. I am wondering if it makes sense if not failing the task by passing in -1.

jcooklin · 2016-06-01T20:13:18Z

#967 has been created to capture #1

For #2 I feel it might make sense for some plugins to implement their own internal retry separate from the fact that the framework will retry a task some x number of times before the task will be placed in a disabled state.

@iceycake: Related to #3 we've had someone else recently request for spooling on failure (process/publish) so I just added a separate issue on it (#966). Any comments you have on #966 would be greatly appreciated.

kjlyon · 2016-12-16T22:30:59Z

Hey @iceycake, thank you again for your great questions! I just wanted to complete the loop on this issue and make sure we answered everything before closing. The first point was resolved by issue #967. The second was addressed by @jcooklin above. And for the third, I would suggest continuing the conversation over on issue #966. Please feel free to add your comments there or open up a new issue if we missed something. Thanks again for your questions and for contributing to Snap!

IRCody added the type/question label Jun 1, 2016

snapbot added the tracked label Jul 9, 2016

IzabellaRaulin mentioned this issue Aug 16, 2016

Initial edit to labeling strategy #1112

Merged

kjlyon closed this as completed Dec 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General publisher plugin error handling questions #964

General publisher plugin error handling questions #964

iceycake commented Jun 1, 2016

lynxbat commented Jun 1, 2016

IRCody commented Jun 1, 2016 •

edited

Loading

iceycake commented Jun 1, 2016

jcooklin commented Jun 1, 2016 •

edited

Loading

kjlyon commented Dec 16, 2016

General publisher plugin error handling questions #964

General publisher plugin error handling questions #964

Comments

iceycake commented Jun 1, 2016

lynxbat commented Jun 1, 2016

IRCody commented Jun 1, 2016 • edited Loading

iceycake commented Jun 1, 2016

jcooklin commented Jun 1, 2016 • edited Loading

kjlyon commented Dec 16, 2016

IRCody commented Jun 1, 2016 •

edited

Loading

jcooklin commented Jun 1, 2016 •

edited

Loading