-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Conditional Routing #1007
Comments
I really like this! It is clever to set the routes in the EventMetadata. I do think users will want the The thing I would like some more clarity on is what really changes from the router's perspective for sinks vs processors. Event though sinks don't have ordering, it seems like the implementation would be the same. |
In the first alternative, should the second router listing say |
@dapowers87 , That was a typo, and I updated it. Thanks for catching and noting it! |
I really like the idea of the first alternative. Having a router be a base component sounds like it could open itself up to a lot of neat ideas down the road. |
What if we made pipeline-a:
source:
http:
to-router: router-a
pipeline-b:
processor:
grok: ...
add_entry: ...
to-router: router-b
pipeline-c:
processor:
grok: ...
add_entry: ...
sink:
opensearch: ...
pipeline-d:
processor:
grok: ...
add_entry: ...
to-router: router-c
pipeline-sink-a:
sink:
opensearch: ...
stdout:
pipeline-sink-b:
sink:
opensearch: ...
router:
- name: router-a
set_routes:
- pipeline-b: "some_conditional"
- pipeline-c: "secondary_conditional"
- name: router-b
set_routes:
- pipeline-c: "third_conditional"
- pipeline-d: "fourth_conditional"
- name: router-c
set_routes:
- pipeline-sink-a: "fifth_conditional"
- pipeline-sink-b: "sixth_conditional"
|
Can this already be achieved like so:
An alternative I would like to propose is making a
|
@graytaylor0 , the original conditional would be evaluated once but every downstream processor / sink / core will need to evaluate the route. |
Yes but evaluating if a |
I could be wrong about this, but I am fairly sure that this would not work. I think that in the current state, a pipeline can only receive Events from another pipeline by specifying that pipeline as the source.
|
There are a few reasons we cannot do this now.
It is for the third reason that I proposed a router processor. These routers would apply routes to Events early in a pipeline and then allow them to route differently later.
This approach is somewhat similar to some of the alternatives I described (though it moves from Core to a plugin). If Data Prepper had a |
The discussions indicate some additional interest in a solution similar to the first alternative provided. I want to clarify the similarities and differences between the proposal and the first alternative. Then I'll elaborate on the specifics of the first alternative. Finally, I'll make some pros and cons for each. SimilaritiesBoth approaches use a router prior to sending data to sinks. The diagram below shows where events are routed. DifferencesFirst, I'd like to make distinction between two phases of routing.
The primary proposal and the alternatives in this RFC have two different approaches for evaluation of routes.
Pre-EvaluationThe current proposal to include a Just-in-Time EvaluationThe first alternative is to support just-in-time evaluation of routes. In this approach, the router will apply both Routing Evaluation and Routing Application. The following diagram shows how this works conceptually. Details on First AlternativeThe YAML syntax for the first alternative is repeated below.
While the Multiple RoutesThis approach can also support the
Achieving Similar Behavior to Pre-EvaluationPipeline authors can achieve a similar result as pre-evaluation using the new mutate processors, after they have support for conditional configuration. Pipeline authors will be able to conditionally set user-defined fields on Events in mutate processor. In the router definition, they can configure the routing based on the values user-defined field. The following shows a conceptual example. Note that the
The only significant downside is that the field is now part of the Event and the Sink will send it to the final destination system. This can still be worked around by routing to a second pipeline which drops the field from the Event. Pros & Cons of the ApproachesThis section considers some advantages and disadvantages to each approach. PerformanceI expect the performance of either routing approach to be similar. There may be some subtle differences depending on the pipelines. For both approaches, evaluation and routing need to take place. Both approaches would iterate over events, evaluate the events against expressions, and then place those events into new collections. This yields a time complexity of O(n) for both approaches, where n is the number of events. Space complexity will be O(n), where n is the number of Events since new pointers are needed. Both approaches do use the concept of "named routes". This may yield some performance gains because expressions do not need to be reevaluated. As a counter example, see the following syntax. In this type of router, the
Both of the proposal and the alternative avoid this duplicate evaluation. However, this is not likely going to have a significant impact on the overall performance of the pipeline. Ease-of-UseBecause these two approaches apply routes in different places, the approach taken will have an impact on how easy and intuitive Data Prepper is for pipeline authors. I expect that placing conditionals closer to the routing - as just-in-time evaluation does - will be easier for pipeline authors to grasp. The ability to add routes in one pipeline and have them continue to apply in others may be useful for robust pipeline authors. But, it could also cause confusion for new pipeline authors. The just-in-time evaluation yields simpler usage here as well. VisualizationOne long-term possibility for Data Prepper is to support pipeline visualizations. In particular, the ability to build pipelines graphically. The alternative approach (just-in-time evaluation) may be better suited for this because the conditions sit directly on the lines where the routing occurs. This diagram shows how the pre-evaluation might be visualized. As you can see, the routes are defined as properties on the In the diagram for just-in-time evaluation, the conditions can be visualized directly on the arrows. FlexibilityThe pre-evaluation route may offer more flexibility. In particular, it can nicely accommodate routes on processors. Any processor could be executed only for events of a certain route. This could allow for a weaving of Events in and out of processors. This may be powerful, but could also be hard to visualize. |
@cmanning09 , I want to be sure to address the idea of a "router-as-a-sink" which you proposed. In this comment, I want to compare it similarly to how I compared the pre-evaluation and just-in-time evaluation in my last comment. PerformanceI do expect that the performance will be similar. The time and space complexity should both remain O(n). The main difference may lie in the threading model. Each Sink runs in parallel currently in some shared threads. This Data Prepper does have some open issues for allowing Processors to make better use of the threading model. So this could be improved in the future. But, for now, it would be either sequential or create its own threads. Ease-of-UseHaving the Pipeline VisualizationI tried to create a similar visualization for your proposal. I'm pasting it below. If you believe there is a better visualization, please share. I do think the visualizations for the other approaches are more intuitive. FlexibilityI expect that this approach is just as flexible as the just-in-time router evaluation. I do not see any loss of flexibility there. As with the just-in-time router evaluation, this solution would have no bearing on processor routing. |
Original RFC ProposalThe following comment is here to track the original RFC proposal. Data Prepper will add a new Event metadata field for routes. Once a route is added, it remains part of the Event as long as that Event exists within Data Prepper. Data Prepper will also include a new processor named The routing conditionals will use the condition syntax as determined by #1005. ConfigurationThe following is an example of the proposed configuration. It is the YAML configuration for the pipeline shown in the diagram above.
Additionally, pipeline authors can use the
Data Prepper will provide both the Route EvaluationThe This allows pipeline authors to apply routes using fields which you wish to remove or alter later. The following example shows how the
ImplementationThe This is different from the original I propose adding the Model ChangesThe
|
I'm re-opening this RFC for discussion based on a comment in a related PR - #1681 (review). The current YAML design proposed in this RFC is pasted again below.
The Some of the reasoning for this structure:
It has a few downsides worth noting:
Overall, I think that the compact and clear syntax will help more users overall. But, I'd like to get other thoughts. Some New AlternativesAdditionally, we can consider breaking the syntax with a few possible forms: Alternative 1: Combine router and sinks. There would no longer be a
Alternative 2: Invert how we generate plugins. This could be combined with #1025 to move
Alternative 3: Again using #1025, if plugin ids are required, we can use them in the router.
|
We are defining routes and their destinations separately. Can this be (mis-)used as a dropping mechanism? All http-logs would not have a sink in the example below. This may be an Implementation detail but something I wanted to call out.
|
The gains to the pipeline author experience for configuring sinks according to the current RFC outweigh some of the possible issues which arise from confusion to plugin developers. So I'll continue to keep the One change I'd like to suggest from the current RFC is to rename the pipeline-level
The following puts this all together?
|
Thanks @dlvenable. I agree with your logic for prioritizing for the pipeline author experience. I like the suggestion to rename |
I'm closing this issue since this is the design we are working toward now as part of #1337. |
Background
Data Prepper pipelines currently do not support conditionals or routing. Thus all events in Data Prepper must flow through all sinks and processors in a pipeline. Many users require the ability to route events to different sinks and processors depending on the specific event.
The following diagram outlines a common scenario: Users need to route data to different sinks depending on some property of the event.
Proposal
This RFC introduces a concept of a router to Data Prepper. Pipeline authors can define named routes in the router. Data Prepper will apply routes to individual Events before sending them to Sinks.
This GitHub issue focuses on using routing to route sinks. See #522 RFC for a proposal for routing through a processor chain.
The following diagram outlines where the router will sit and what it will perform.
Design
Data Prepper will introduce a new
router
component to the pipeline. This is at the same level of the YAML as theprepper
andsink
. The router will run after the Processor chain and before the Sinks. Data Prepper would evaluate these routes directly before passing the Events into the sinks.Any
sink
with theroutes
property will only accept Events which match at least one of the routes. In the example above,application-logs
is a named route. Data Prepper will only route events with theapplication-logs
route to the firstopensearch
sink.By default, Data Prepper will route all Events to a sink which does not define a route. Thus, in the example above, all Events will go into the third
opensearch
sink.Alternatives
See the comments below for alternatives.
The text was updated successfully, but these errors were encountered: