Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Basic Conditional Logic in Processors #522

Open
5 tasks
dlvenable opened this issue Nov 2, 2021 · 11 comments
Open
5 tasks

[RFC] Basic Conditional Logic in Processors #522

dlvenable opened this issue Nov 2, 2021 · 11 comments
Assignees
Labels
backlog proposal Proposed major changes to Data Prepper

Comments

@dlvenable
Copy link
Member

dlvenable commented Nov 2, 2021

Data Prepper currently does not provide any control logic within pipelines. This proposal is to add a basic conditional system for processing data in Preppers only when certain conditions are met.

This proposal is to add a new when field which is available for all Processors.

When a Prepper contains the when field, the condition expressed in the value must be true for the Prepper to process a given event. If the field is not provided, then all events are processed.

The conditional expression should support:

  • Equality operators: ==, >, <, >=, <=
  • Boolean operators: and, or, and not
  • Set operators: in, and not in
  • Regex operators: =~ and !~ which check a pattern on the right against the string on the left
  • Fields will be accessed using JsonPointer as defined in Resolve approach for JSON path and dots in key names #450
  • Sets defined by [] and comma delimited

Thus, an example might be:

preppers:
  ...
  - grok:
       when: "/http/response/status_code and /http/response/status_code >= 400 and /http/response/status_code < 500"
       match: "..."
  - grok:
       when: "/http/response/status_code in [500, 501]"
       match: "..."

Implementation

The AbstractPrepper class can support the when behavior so that individual Prepper implementations do not need to handle the when field.

This will require that AbstractPrepper receive Event types and not any type. This is ongoing work in #319. Making this change in AbstractPrepper is a breaking change though since it does not require the Event type currently.

The AbstractPrepper will only call doExecute for records which meet the conditional expression provided by when.

Tasks

  • Create ANTLR Parser
  • Create statement evaluator
  • Add when property to Abstract Processor (scope pending)
  • Create logstash config converter
  • Finalize scope for 1.3
@dlvenable dlvenable added untriaged proposal Proposed major changes to Data Prepper labels Nov 2, 2021
@dlvenable dlvenable added this to the v1.3 milestone Nov 16, 2021
@ryn9
Copy link

ryn9 commented Dec 8, 2021

The given example seems to indicate that all preppers would be iterated through and each where cause evaluated. Would there be a way to short circuit the rest of the preppers in the event a 'when' clause is met? Though would allow for a more conventual if-then-else style evaluation.

@dlvenable
Copy link
Member Author

@ryn9, Thanks for the feedback. This proposal does not include conditionals any more complicated than skipping a single Prepper/Processor.

Would the current proposal be of value to you? Do you need something more like if-then-else? My thinking is that both could exist. The proposed when approach may yield clear and compact pipelines when it is sufficient. But, an if-then-else could be available to use when it is needed.

@ryn9
Copy link

ryn9 commented Dec 9, 2021

So long as if-then-else is coming - sounds good to me :)

@sbayer55
Copy link
Member

Two possible alternatives that allow branching logic to be represented with nested components.

  1. Data Prepper could leverage the option to chain multiple pipelines to represent branching logic. Adding a new sink named "conditional" that contains an array of {if: "condition", sink: [sink1, sink2]}. Each subsiquent pipeline would need to handle a branch in contorl flow.
  source:
    ...
  sink:
    - conditional:
      - if: "/http/response/status_code and /http/response/status_code >= 400 and /http/response/status_code < 500"
        sink:
          - pipeline:
              name: "pipeline-a"
    - conditional:
      - if: false
        sink: []
      - if: /http/response/status_code in [500, 501, 502]
        sink:
          - pipeline:
              name: "pipeline-b"
      - if: true # effectivly else
        sink:
          - pipeline:
              name: "pipeline-a"
          - pipeline:
              name: "pipeline-b"
pipeline-a:
  source:
    pipeline:
      name: "start-pipeline"
  sink:
    ...

pipeline-b:
  source:
    pipeline:
      name: "start-pipeline"
  sink:
    ...```

Another option with a less verbose syntax could be to create a conditional processor. The conditional processor would need to support nested processors. Branching processor logic is cleaner but directing events to different sinks based on conditionals requires a more complex syntax.

```start-pipeline:
  source:
    ...
  processor:
    - conditional:
      - if: "/http/response/status_code and /http/response/status_code >= 400 and /http/response/status_code < 500"
        processor:
          - grok:
            ...
          - another-processor:
            ...
      - if: "/http/response/status_code in [500, 501, 502]"
        processor:
          ...
      - if: true # effectivly else
        processor:
          ...
  sink:
    - file:
      ...```

@dlvenable
Copy link
Member Author

Thanks @sbayer55 for putting that together. I have a similar concept, but it uses a conditional_routing Processor.

Basically, this one Processor could route into other processors depending upon the condition. This wouldn't really require any changes to Data Prepper Core either.

my_pipeline:
  source:
    pipeline:
      name: VAR_PIPELINE_NAME
  processor:
    - mutate:
    - conditional_routing:
        - '/http/response/status_code = 500'
            - conditional_routing:
                - '/http/response_body ~= FAILURE1'
                  - mutate:
                     ....
              - '/http/response_body ~= FAILURE2'
                 - mutate:
                    ....
                - otherwise:
                  - mutate:

We've had some concerns about this type of approach though because it could cause problems with a distributed Data Prepper cluster.

I think we should move forward on the when approach to get a feature out in 1.3 and at the same time start an RFC for different approaches to the branching condtionals.

@dlvenable
Copy link
Member Author

Yes, you also have something similar for the processors. I like your syntax a bit more than mine - it's more readable.

@sbayer55 sbayer55 self-assigned this Jan 21, 2022
@sbayer55
Copy link
Member

sbayer55 commented Jan 21, 2022

Thanks for the feedback @dlvenable! I agree we can grow from the when processor parameter to more complex functionality.

Based on your proposed when syntax I'd like to add more specific details and address a few edge cases. I'll use this sample event for reference:

{
    "Valid \"JSON\"key~with.special/characters\\are=={[fun]}": true,
    "nested": {
        "values": [1, 2, 3]
    },
    "boolean": true,
    "boolean-string": "true",
    "int": 3,
    "float": 3.14159265
}

Compare JSON pointer to static value

Simple check conditional checks:

when: '/boolean == true' # True

# /boolean evaluates to a boolean type
# "true" evaluates to a string type
when: '/boolean == "true"' # false

when: '/boolean-string == "true"' # true
when: 'boolean == true' # false

# JSON Pointers can use integers to select a node in an array at index n
when: '/nested/values/1 == 2' # true

when '2 in /nested/values/' # true

Compare JSON pointers

Compare the value of two pointers
This comparison evaluates to '3 in [1, 2, 3]'

when: '/int is in /nested/values' # true

Handling special characters

If a JSON pointer has a more complex key it will impossible to identify where a JSON point terminates. To address this issue I propose we define two ways to declare a JSON Pointer

  1. Shorthand notation beginning with a / and consisting of alphanumeric, hyphen, and forward slash characters. Regex pattern ^\/[A-Za-z0-9-\/]*[A-Za-z0-9-]$
  2. Escaped notation JSON Pointers beginning in "/ and terminating with a " not preceded by an escape character \. The escape character \ used to mark the following character to be evaluated as literal text. Example: You would use "/not \"in\"" to select the key not "in". Key in context: {"not \"in\"": true}.

Escaped notation JSON Pointer to select the key Valid \"JSON\"key~with.special/characters\\are=={[fun]}.

when '"/Valid \"JSON\"key~0with.special~1characters\\are=={[fun]}" == true'

JSON keys use the escape character \. For example the "\"\\" evaluates to "\.
JSON pointer escape characters:

  1. ~0 representing ~
  2. ~1 representing /

Order of operations

Evaluate in order from top to bottom, then left to right for each row

[]
()
in, not in
<, <=, >, >=
=~, !~
==, !=
and, or, not

Examples:

# Evaluates to true:
true == (true or true)
true or false and true

# Evaluates to false:
not true or false

# Evaluates to true:
(not false) or false

White space

A single space characters surrounding operators and a single following commas in a set are options

The following are equivalent:

true == true
true== true
true==true

The following are equivalent:

[1, 2, 3]
[1,2,3]

The following will throw an exception:

true         == true

Given this JSON Object

{
    "JSON        pointer": true
}

The following would evaluate to true

"/JSON        pointer" == true

Character set

I suggest we only allow ascii characters 32-126 to limit parsing complexity and mitigate risk from malformed strings.

@sbayer55
Copy link
Member

To support a when condition on all processes by default I propose following modifications to AbstractProcessor

InputRecord OutputRecord type mismatch

AbstractProcessor defines an input and output record type. If the when condition should cause events to be excluded from the doExecute method and InputRecord.class != OutputRecord.class then then AbstractProcessor would not know how to transform the record.

Out of scope: In an enhancement consider adding support for mismatched types with an output type of String. Then transform the input records using the .toString method.

InputRecord type support.

If the when condition contains a JSON Pointer then AbstractProcessor should only allow InputRecord with a value of one of the following types

  • String
  • JsonNode
  • Event

Out of scope: In an enhancement consider adding support for a way to reference Record.getData(). This might be useful for a condition like when: Record.getData() != "" to skip processing empty strings.

String

Attempt to parse as JSON with Jackson ObjectMapper.readTree(String). Throw exception if parsing fails, otherwise continue as JSON Node as success

JsonNode

Evaluate conditional statement using JsonNode.at(String) to resolve JSON Pointers

Event

Convert event to JSON using Event.toJsonString(). Then handle as type String.

When condition evaluation exception

I propose we add a flag onWhenEvalException configuration property to processes that supports the following values:

  • drop-event
  • drop-event-silent
  • force-true
  • force-false

drop-event

Do not include the event in the doExecute records parameter
Produces a log message for each dropped event

drop-event-silent

Do not include the event in the doExecute records parameter
No LOG message produced
Logs a warning at startup that events may be dropped silently

force-true

Continue processing the event as if the when condition evaluated as true

force-false

Continue processing the event as if the when condition evaluated as false

@dlvenable
Copy link
Member Author

dlvenable commented Feb 8, 2022

If Data Prepper uses the Conditional Routing approach from #1007, this RFC should change to match it.

I believe the following two changes are necessary:

  1. Remove the when property in favor of the route configuration.
  2. Implement the routing logic in Data Prepper Core instead of in the AbstractProcessor class.

@dlvenable
Copy link
Member Author

@ryn9 , You expressed interest in if-else-elseif conditions within Data Prepper. I created a new proposal for conditional routing in #1007. It represents my current thinking on how to handle routing with Data Prepper.

I'd like to understand your use-case(s) for using complex conditionals like if-else-elseif. Data Prepper currently follows a pipeline model and not a programming model. I'd like to see if other concepts can meet your needs rather than if-else-elseif. Please comment on the proposal for #1007 and let us know if there are other scenarios which it does not solve for you.

@dlvenable
Copy link
Member Author

The discussion of the exact syntax Data Prepper will use has been moved into #1005.

@dlvenable dlvenable removed this from the v1.3 milestone Feb 18, 2022
@dlvenable dlvenable changed the title [RFC] Basic Conditional Logic in Preppers [RFC] Basic Conditional Logic in Processors Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog proposal Proposed major changes to Data Prepper
Projects
Development

No branches or pull requests

3 participants