Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

General publisher plugin error handling questions #964

Closed
iceycake opened this issue Jun 1, 2016 · 5 comments
Closed

General publisher plugin error handling questions #964

iceycake opened this issue Jun 1, 2016 · 5 comments

Comments

@iceycake
Copy link
Contributor

iceycake commented Jun 1, 2016

I am creating a task so that it collects some system metrics using the build-in plugins and publish the metrics to TSDB. Due to the extremely high load on the TSDB cluster, occasionally we could get timeout on the TSDB publisher.

Some questions:

  1. How to configure the number of consecutive errors before the task goes disable? Based on the error, I believe right now the setting is 10.
  2. Does retry should be part of the publisher plugin logic or snap task?
  3. Does snap buffer metrics if there is error when trying to publisher?
@lynxbat
Copy link
Contributor

lynxbat commented Jun 1, 2016

I am going to let someone like @jcooklin or @tjmcs comment on #1.

On #2 I would vote now since the deadline time on a workflow may give up.

For #2 and #3 we have an RFC being written now for optionally buffering both processor and publisher calls for a configurable period of time with retry. This would prevent workflow data from being lost and allow for reentry into processor and publisher calls that timeout. This should solve your problem without having to make the publisher maintain more state.

We don't expect the new RFC to be very difficult to implement.

@IRCody
Copy link
Contributor

IRCody commented Jun 1, 2016

The setting for failures is 10. Right now it's set in rest here. Pretty low hanging fruit if we wanted to make it configurable.

@iceycake
Copy link
Contributor Author

iceycake commented Jun 1, 2016

@IRCody Looks like it's a very easy fix to make it configurable. I am wondering if it makes sense if not failing the task by passing in -1.

@jcooklin
Copy link
Collaborator

jcooklin commented Jun 1, 2016

#967 has been created to capture #1

For #2 I feel it might make sense for some plugins to implement their own internal retry separate from the fact that the framework will retry a task some x number of times before the task will be placed in a disabled state.

@iceycake: Related to #3 we've had someone else recently request for spooling on failure (process/publish) so I just added a separate issue on it (#966). Any comments you have on #966 would be greatly appreciated.

@kjlyon
Copy link
Contributor

kjlyon commented Dec 16, 2016

Hey @iceycake, thank you again for your great questions! I just wanted to complete the loop on this issue and make sure we answered everything before closing. The first point was resolved by issue #967. The second was addressed by @jcooklin above. And for the third, I would suggest continuing the conversation over on issue #966. Please feel free to add your comments there or open up a new issue if we missed something. Thanks again for your questions and for contributing to Snap!

@kjlyon kjlyon closed this as completed Dec 16, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants