Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Exception in substitute string processor shuts down processor work but not pipeline #2956

Closed
engechas opened this issue Jun 30, 2023 · 4 comments · Fixed by #3846
Closed
Assignees
Labels
bug Something isn't working Priority-High
Milestone

Comments

@engechas
Copy link
Collaborator

Describe the bug
When the substitute string processor encounters an exception while processing, it throws that exception and shuts down the process worker thread. The rest of the pipeline remains active but unable to make progress on incoming data. Example exception that shut down a process worker thread:

2023-06-30T07:11:01.606 [s3-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.pipeline.ProcessWorker - Encountered exception during pipeline s3-pipeline processing
java.lang.IllegalArgumentException: key _<redacted> must contain only alphanumeric chars with .-_ and must follow JsonPointer (ie. 'field/to/key')

To Reproduce
Steps to reproduce the behavior:

  1. Create a pipeline with the substitute string processor
  2. Ingest data that causes an exception to be thrown, example: invalid JSON pointer

Expected behavior
The entire pipeline should shut down rather than remaining partially alive but unable to process data

@asifsmohammed
Copy link
Collaborator

@engechas @graytaylor0 Will this PR fix this bug? #2945

@graytaylor0
Copy link
Member

@asifsmohammed No that PR will just make the characters ._- valid at the start and end of the keys. There could still be other invalid characters in keys that would throw this exception

@kkondaka
Copy link
Collaborator

@graytaylor0 @asifsmohammed @dlvenable This is happening if the pipeline config has keys with invalid characters. I think we should check this in the config validation. For example, the following config caused this error

sample-pipeline:                                                                                                                                       
  source:                                                                                                                                              
    http:                                                                                                                                              
  processor:                                                                                                                                           
    - substitute_string:                                                                                                                               
        entries:                                                                                                                                       
          - source: "mes&sage"                                                                                                                         
            from: "A"                                                                                                                            
            to: "B"                                                                                                                                     
  sink:                                                                                                                                                
   - stdout:   

In the above config the source name containing invalid characters.

@dlvenable
Copy link
Member

@kkondaka , I have a proposal in #1916 that will help with this solution by getting the key during processor start-up. It would also help with performance. If we can move toward that, we can have a good solution for a couple of issues we've been having.

@dlvenable dlvenable added this to the v2.6.2 milestone Dec 12, 2023
@github-project-automation github-project-automation bot moved this from Unplanned to Done in Data Prepper Tracking Board Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Priority-High
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants