Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for nested syntax in converter #1088

Merged
merged 15 commits into from
Mar 6, 2022

Conversation

asifsmohammed
Copy link
Collaborator

@asifsmohammed asifsmohammed commented Feb 24, 2022

Signed-off-by: Asif Sohail Mohammed [email protected]

Description

This PR only covers nested syntax for Date, Grok (key in match, named captures) and DefaultLogstashPluginAttributesMapper

Logstash nested syntax: [outer][inner]
Data Prepper JSON pointer: /outer/inner

List of plugin configurations where nested syntax can be used in Logstash

  1. Grok

    • match
    • pattern_definitions
    • overwrite
  2. Date

    • match
    • target
  3. Mutate

    • add
    • rename
    • copy
    • delete
  4. KV

    • surce
    • target

Integration tests will give a high level idea of where nested syntax can be used in different processors. Files for Inegration tests are log-ingest-to-opensearch-nested-syntax.conf and log-ingest-to-opensearch-nested-syntax.expected.yaml

Issues Resolved

#627

Check List

  • New functionality includes testing.
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@asifsmohammed asifsmohammed requested a review from a team as a code owner February 24, 2022 06:05
@asifsmohammed asifsmohammed marked this pull request as draft February 24, 2022 06:10
@codecov-commenter
Copy link

codecov-commenter commented Feb 24, 2022

Codecov Report

Merging #1088 (6b0690a) into main (9dce183) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##               main    #1088   +/-   ##
=========================================
  Coverage     91.32%   91.32%           
  Complexity      692      692           
=========================================
  Files            85       85           
  Lines          2017     2017           
  Branches        170      170           
=========================================
  Hits           1842     1842           
  Misses          133      133           
  Partials         42       42           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9dce183...6b0690a. Read the comment docs.


private static String convertAttributeValueNestedSyntax(Object attributeValue) {
StringBuilder convertedAttribute = new StringBuilder();
String nestedSyntaxRegex = "\\[([^\\]]+)\\]";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making them static constant.

Copy link
Member

@graytaylor0 graytaylor0 Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex has some flaws in it. For example, things like [foo[bar] will pass. After the first [foo] is found, you are not verifying any more (so [foo]anythingcangohere passes. I think something closer to this pattern should work to allow one or more non-empty [] that does not contain brackets inside of it. That way we can verify the entire string matches first, and then we can go through and match on each bracket.

(\\[([^\\]\\[]+)\\])+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easiest way to do use this pattern would be something like this. I am sure there is a way to go to each [foo] section and append a / like you are doing, but this way may be easier

Matcher matcher = NESTED_SYNTAX_PATTERN.matcher(logstashString);

if (matcher.matches() {
     String result =  logstatshString.replaceAll("\\]\\[", "/").replaceAll("\\[|\\]", "/");
     return result;
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback Taylor. I'll work on regex. It's very hard to find the regex for complete nested syntax string. That's the reason I'm looking for groups of []. One way to do it is just check starting and ending characters like^\[.+\]$.

I also wrote the regex based on the assumption that we get valid Logstash configuration.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class NestedSyntaxConverterUtil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making the method parameters and other variables final.

@@ -66,20 +66,22 @@ protected void mapCustomAttributes(List<LogstashAttribute> logstashAttributes, L
if (logstashGrokMatchValueType.equals(LogstashValueType.HASH)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, we should consider refactoring the if-else block. We can create an interface and implementation for specific type of match attribute (HASH vs ARRAY vs Others...) to follow single responsibility principle and keep code maintainable.

@@ -138,4 +139,21 @@ void mapAttributes_with_negation_expression_negates_boolean_value(boolean inputV
assertThat(actualPluginModel.get(0).getPluginSettings().size(), equalTo(1));
assertThat(actualPluginModel.get(0).getPluginSettings().get(dataPrepperAttribute.substring(1)), equalTo(!inputValue));
}

@ParameterizedTest
@CsvSource({"[outer_key][inner_key],/outer_key/inner_key", "[array][0][key], /array/0/key"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add one case for single brackets (ex: ["message"])

private NestedSyntaxConverterUtil() {}

public static Object checkAndConvertLogstashNestedSyntax(Object logstashAttributeValue) {
if (logstashAttributeValue instanceof String && ((String) logstashAttributeValue).startsWith("[") && ((String) logstashAttributeValue).endsWith("]")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check shouldn't be necessary if the regex pattern is correct. I would instead compile the regex pattern as a static variable in the class, and do a check for if the logstashAttributeValue matches the regex pattern.


private static String convertAttributeValueNestedSyntax(Object attributeValue) {
StringBuilder convertedAttribute = new StringBuilder();
String nestedSyntaxRegex = "\\[([^\\]]+)\\]";
Copy link
Member

@graytaylor0 graytaylor0 Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex has some flaws in it. For example, things like [foo[bar] will pass. After the first [foo] is found, you are not verifying any more (so [foo]anythingcangohere passes. I think something closer to this pattern should work to allow one or more non-empty [] that does not contain brackets inside of it. That way we can verify the entire string matches first, and then we can go through and match on each bracket.

(\\[([^\\]\\[]+)\\])+

@asifsmohammed asifsmohammed force-pushed the nested-syntax branch 2 times, most recently from 0ed5f58 to 4ec61bc Compare February 27, 2022 18:37
}

@Override
protected HashSet<String> getCustomMappedAttributeNames() {
return new HashSet<>(Arrays.asList(LOGSTASH_GROK_MATCH_ATTRIBUTE_NAME, LOGSTASH_GROK_PATTERN_DEFINITIONS_ATTRIBUTE_NAME));
return new HashSet<>(Arrays.asList(LOGSTASH_GROK_MATCH_ATTRIBUTE_NAME,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving this to static constant.

Comment on lines 33 to 40
logstashAttributes.forEach(logstashAttribute -> {
if (logstashAttribute.getAttributeName().equals(LOGSTASH_GROK_MATCH_ATTRIBUTE_NAME)) {
matchAttributes.add(logstashAttribute);
} else if (logstashAttribute.getAttributeName().equals(LOGSTASH_GROK_PATTERN_DEFINITIONS_ATTRIBUTE_NAME)) {
patternDefinitions.putAll((Map<String, String>) logstashAttribute.getAttributeValue().getValue());
} else if (logstashAttribute.getAttributeName().equals(LOGSTASH_GROK_OVERWRITE_ATTRIBUTE_NAME)) {
keysToOverwrite.addAll((List<String>) logstashAttribute.getAttributeValue().getValue());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is doing multiple things. Consider moving the responsibility of each if condition to separate class to make this class more maintainable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean extract this into a method? Creating a new class for this would be confusing for plugin authors for adding converter logic. I can extract it to a separate method.

@asifsmohammed asifsmohammed force-pushed the nested-syntax branch 2 times, most recently from 4e17bcb to b42bfb4 Compare March 2, 2022 22:26
@asifsmohammed asifsmohammed marked this pull request as ready for review March 2, 2022 22:27
Matcher grokMatchPatternMatcher = GROK_MATCH_PATTERN.matcher(matchPattern);
StringBuilder convertedGrokMatchPattern = new StringBuilder();
while(grokMatchPatternMatcher.find()) {
convertedGrokMatchPattern.append(grokMatchPatternMatcher.group(1));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are magic numbers. What do they mean? We should define constants for them that describe what they mean. What is in group 1? Group 2? etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add group names for regex.


import java.util.Map;

public interface GrokLogstashPluginAttributeMapperHelper {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be package protected. I'd also be happy to see them as static inner classes because their scope of use is so small.

* @return A Map
* @since 1.2
*/
List<String> getNestedSyntaxAttributeNames();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach has helped reduce complexity for the simpler scenarios such as kv.

@@ -138,4 +139,22 @@ void mapAttributes_with_negation_expression_negates_boolean_value(boolean inputV
assertThat(actualPluginModel.get(0).getPluginSettings().size(), equalTo(1));
assertThat(actualPluginModel.get(0).getPluginSettings().get(dataPrepperAttribute.substring(1)), equalTo(!inputValue));
}

@ParameterizedTest
@CsvSource({"[outer_key][inner_key],/outer_key/inner_key", "[array][0][key], /array/0/key", "[message], /message"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Matcher grokMatchPatternMatcher = GROK_MATCH_PATTERN.matcher(matchPattern);
StringBuilder convertedGrokMatchPattern = new StringBuilder();
while(grokMatchPatternMatcher.find()) {
convertedGrokMatchPattern.append(grokMatchPatternMatcher.group("openingBrackets"));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to define these (openiingBrackets, grokSyntax, etc) as constants. That way there's only one place that they are defined. That way if someone changes the regex pattern it doesn't break the code below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback Phill. I made the changes to the logic to remove regex groups as it is getting complex.

final List<LogstashAttribute> matchAttributes = new ArrayList<>();
final Map<String, String> patternDefinitions = new HashMap<>();
GrokLogstashPluginAttributeMapperHelper matchAttributesHelper = new GrokMatchAttributeHelper();
GrokLogstashPluginAttributeMapperHelper patternDefinitionsAttributeHelper = new GrokPatternDefinitionsAttributeHelper();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to constructor.


import static org.opensearch.dataprepper.logstash.mapping.GrokLogstashPluginAttributesMapper.LOGSTASH_GROK_MATCH_ATTRIBUTE_NAME;

class GrokMatchAttributeHelper implements GrokLogstashPluginAttributeMapperHelper {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add documentation

return convertedGrokMatchPattern.toString();
}

private static String getConvertedMatchPattern(String matchPatternGroup) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: final to method parameter and other variables in this class.

@dinujoh dinujoh self-requested a review March 3, 2022 22:49
@asifsmohammed asifsmohammed force-pushed the nested-syntax branch 2 times, most recently from 5729383 to 6b0690a Compare March 4, 2022 04:55
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class GrokMatchUtil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class also needs to be package protected.

Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
dinujoh
dinujoh previously approved these changes Mar 4, 2022
@dinujoh dinujoh self-requested a review March 4, 2022 19:15
Comment on lines 36 to 39
final Map<String, GrokLogstashPluginAttributeMapperHelper> helperInstances = new HashMap<>();
helperInstances.put(LOGSTASH_GROK_MATCH_ATTRIBUTE_NAME, matchAttributesHelper);
helperInstances.put(LOGSTASH_GROK_PATTERN_DEFINITIONS_ATTRIBUTE_NAME, patternDefinitionsAttributeHelper);
helperInstances.put(LOGSTASH_GROK_OVERWRITE_ATTRIBUTE_NAME, overwriteAttributeHelper);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be moved to constructor as well.

Define helperInstances as class instance variable and initialize in the constructor.
private final Map<String, GrokLogstashPluginAttributeMapperHelper> helperInstances

dlvenable
dlvenable previously approved these changes Mar 4, 2022
Signed-off-by: Asif Sohail Mohammed <[email protected]>
dlvenable
dlvenable previously approved these changes Mar 4, 2022
@asifsmohammed asifsmohammed merged commit 890e026 into opensearch-project:main Mar 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants