Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParseJsonProcessor initial implementation #1688

Conversation

finnroblin
Copy link
Contributor

Description

Starting point to implement the parse_json processor. This PR adds a configuration class, unit tests, and implementation of core features of the parse_json processor. With these changes the processor can deserialize JSON strings and will deserialize all nested fields. A JSON pointer feature to parse only a part of the JSON still needs to be added. The behavior of deserialized JSON Arrays still needs to be tested and some modifications to the logic might be necessary.

Issues Resolved

Step towards resolving #831

Check List

  • New functionality includes testing.
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@finnroblin finnroblin requested a review from a team as a code owner August 23, 2022 18:56
*
* @return The name of the destination field.
*/
public String getDestination() {
Copy link
Contributor Author

@finnroblin finnroblin Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better for the default destination (Event root) to be / instead of null. Then we avoid some null values and my opinion is that / is more declarative. What do other people think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we used / as the root what would the expected behavior behavior be if someone had multiple /s in the destination? One might expect it to create a nested field similar to directory syntax .

ex: Would destination: /foo/bar become foo.bar: <parsed_json> where bar is a nested field?

That might be useful, but I don't think that's what we're going for here so maybe / would be misleading. I do think your point about a more declarative default is valid though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the behavior if someone had multiple / in the destination (/foo/bar) is that the parsed json will be written to the most nested field in the destination. So if the json is {"key":"value"} and the destination is (/foo/bar), event.get("/foo/bar", Map.class) == {"key":"value"} and event.get("/foo/bar/key", String.class) == "value". Right now the parse_json processor doesn't flatten nested fields (so /foo/bar is not flattened to foo.bar), it just puts any nested json into the event without changing it.

I do see how / is misleading. It's also misleading for another reason: calling event.put("/", value) will not put value into the event root because the JacksonEvent implementation trims leading slashes. So it would not write value to the event at all.

I think we can stick with a null default and be more declarative by mentioning that destination is null -> write to root in documentation.

travisbenedict
travisbenedict previously approved these changes Aug 24, 2022
final Event event = record.getData();
final String message = event.get(source, String.class);
try {
TypeReference<HashMap<String, Object>> hashMapTypeReference = new TypeReference<HashMap<String, Object>>() {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching! I'll add final in the next revision or the next parse_json PR to avoid triggering re-review.

*
* @return The name of the destination field.
*/
public String getDestination() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we used / as the root what would the expected behavior behavior be if someone had multiple /s in the destination? One might expect it to create a nested field similar to directory syntax .

ex: Would destination: /foo/bar become foo.bar: <parsed_json> where bar is a nested field?

That might be useful, but I don't think that's what we're going for here so maybe / would be misleading. I do think your point about a more declarative default is valid though

Copy link
Member

@sbayer55 sbayer55 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the new ParseJsonProcessor looks good, thanks for adding it!

implementation 'com.fasterxml.jackson.core:jackson-databind'
}

test {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend setting a minimum test coverage to maintain our high coding standards. Here is an example for another plugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the recommendation!

I set the threshold to 96% but it's slightly higher in reality because there aren't any unit tests for the prepareForShutdown, isReadyForShutdown, and shutdown methods. These methods are needed to extend AbstractProcessor but they're empty because parse_json doesn't have any special behavior for shutdown. The implemented line and branch coverage are 100%.

@BeforeEach
void setup() {
ParseJsonProcessorConfig defaultConfig = new ParseJsonProcessorConfig();
lenient().when(processorConfig.getSource()).thenReturn(defaultConfig.getSource());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: lenient() can hide unexpected function calls in tests.

@codecov-commenter
Copy link

codecov-commenter commented Aug 25, 2022

Codecov Report

Merging #1688 (949011c) into main (a087000) will decrease coverage by 0.16%.
The diff coverage is 90.32%.

@@             Coverage Diff              @@
##               main    #1688      +/-   ##
============================================
- Coverage     93.58%   93.41%   -0.17%     
- Complexity     1304     1307       +3     
============================================
  Files           170      172       +2     
  Lines          3804     3829      +25     
  Branches        303      303              
============================================
+ Hits           3560     3577      +17     
- Misses          173      181       +8     
  Partials         71       71              
Impacted Files Coverage Δ
...mazon/dataprepper/plugins/prepper/NoOpPrepper.java 0.00% <ø> (ø)
...zon/dataprepper/plugins/prepper/StringPrepper.java 88.88% <ø> (ø)
...lugins/processor/parsejson/ParseJsonProcessor.java 88.88% <88.88%> (ø)
.../processor/parsejson/ParseJsonProcessorConfig.java 100.00% <100.00%> (ø)
.../opensearch/dataprepper/parser/PipelineParser.java 83.33% <0.00%> (-5.24%) ⬇️
...rwarder/discovery/AwsCloudMapPeerListProvider.java 92.68% <0.00%> (-2.44%) ⬇️
...ataprepper/parser/model/PipelineConfiguration.java 97.95% <0.00%> (-2.05%) ⬇️
...prepper/parser/PipelineConfigurationValidator.java 71.18% <0.00%> (+1.69%) ⬆️
.../org/opensearch/dataprepper/pipeline/Pipeline.java 92.85% <0.00%> (+4.28%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Comment on lines 15 to 16
@NotEmpty
@NotNull
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: @notempty also checks for not null. I think @NotNull is required in this case.

Comment on lines 10 to 11
group 'org.opensearch.dataprepper'
version '2.0.0-SNAPSHOT'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these 2 lines. it's not required

Comment on lines 18 to 19
testImplementation 'org.junit.jupiter:junit-jupiter-api:5.8.1'
testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.8.1'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove these dependencies. I think these are used from parent project.

@finnroblin finnroblin requested a review from sbayer55 August 26, 2022 15:39
Copy link
Member

@sbayer55 sbayer55 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks Finn!

@asifsmohammed asifsmohammed merged commit 6757385 into opensearch-project:main Aug 26, 2022
engechas pushed a commit to engechas/data-prepper that referenced this pull request Sep 12, 2022
* Initial implementation of parse_json processor

Signed-off-by: Finn Roblin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants