-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for nested syntax in converter #1088
Added support for nested syntax in converter #1088
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1088 +/- ##
=========================================
Coverage 91.32% 91.32%
Complexity 692 692
=========================================
Files 85 85
Lines 2017 2017
Branches 170 170
=========================================
Hits 1842 1842
Misses 133 133
Partials 42 42 Continue to review full report at Codecov.
|
|
||
private static String convertAttributeValueNestedSyntax(Object attributeValue) { | ||
StringBuilder convertedAttribute = new StringBuilder(); | ||
String nestedSyntaxRegex = "\\[([^\\]]+)\\]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making them static constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex has some flaws in it. For example, things like [foo[bar]
will pass. After the first [foo]
is found, you are not verifying any more (so [foo]anythingcangohere
passes. I think something closer to this pattern should work to allow one or more non-empty []
that does not contain brackets inside of it. That way we can verify the entire string matches first, and then we can go through and match on each bracket.
(\\[([^\\]\\[]+)\\])+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The easiest way to do use this pattern would be something like this. I am sure there is a way to go to each [foo]
section and append a /
like you are doing, but this way may be easier
Matcher matcher = NESTED_SYNTAX_PATTERN.matcher(logstashString);
if (matcher.matches() {
String result = logstatshString.replaceAll("\\]\\[", "/").replaceAll("\\[|\\]", "/");
return result;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback Taylor. I'll work on regex. It's very hard to find the regex for complete nested syntax string. That's the reason I'm looking for groups of []. One way to do it is just check starting and ending characters like^\[.+\]$
.
I also wrote the regex based on the assumption that we get valid Logstash configuration.
import java.util.regex.Matcher; | ||
import java.util.regex.Pattern; | ||
|
||
class NestedSyntaxConverterUtil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making the method parameters and other variables final.
@@ -66,20 +66,22 @@ protected void mapCustomAttributes(List<LogstashAttribute> logstashAttributes, L | |||
if (logstashGrokMatchValueType.equals(LogstashValueType.HASH)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, we should consider refactoring the if-else block. We can create an interface and implementation for specific type of match attribute (HASH vs ARRAY vs Others...) to follow single responsibility principle and keep code maintainable.
@@ -138,4 +139,21 @@ void mapAttributes_with_negation_expression_negates_boolean_value(boolean inputV | |||
assertThat(actualPluginModel.get(0).getPluginSettings().size(), equalTo(1)); | |||
assertThat(actualPluginModel.get(0).getPluginSettings().get(dataPrepperAttribute.substring(1)), equalTo(!inputValue)); | |||
} | |||
|
|||
@ParameterizedTest | |||
@CsvSource({"[outer_key][inner_key],/outer_key/inner_key", "[array][0][key], /array/0/key"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add one case for single brackets (ex: ["message"])
private NestedSyntaxConverterUtil() {} | ||
|
||
public static Object checkAndConvertLogstashNestedSyntax(Object logstashAttributeValue) { | ||
if (logstashAttributeValue instanceof String && ((String) logstashAttributeValue).startsWith("[") && ((String) logstashAttributeValue).endsWith("]")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check shouldn't be necessary if the regex pattern is correct. I would instead compile the regex pattern as a static variable in the class, and do a check for if the logstashAttributeValue matches the regex pattern.
|
||
private static String convertAttributeValueNestedSyntax(Object attributeValue) { | ||
StringBuilder convertedAttribute = new StringBuilder(); | ||
String nestedSyntaxRegex = "\\[([^\\]]+)\\]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex has some flaws in it. For example, things like [foo[bar]
will pass. After the first [foo]
is found, you are not verifying any more (so [foo]anythingcangohere
passes. I think something closer to this pattern should work to allow one or more non-empty []
that does not contain brackets inside of it. That way we can verify the entire string matches first, and then we can go through and match on each bracket.
(\\[([^\\]\\[]+)\\])+
0ed5f58
to
4ec61bc
Compare
} | ||
|
||
@Override | ||
protected HashSet<String> getCustomMappedAttributeNames() { | ||
return new HashSet<>(Arrays.asList(LOGSTASH_GROK_MATCH_ATTRIBUTE_NAME, LOGSTASH_GROK_PATTERN_DEFINITIONS_ATTRIBUTE_NAME)); | ||
return new HashSet<>(Arrays.asList(LOGSTASH_GROK_MATCH_ATTRIBUTE_NAME, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving this to static constant.
logstashAttributes.forEach(logstashAttribute -> { | ||
if (logstashAttribute.getAttributeName().equals(LOGSTASH_GROK_MATCH_ATTRIBUTE_NAME)) { | ||
matchAttributes.add(logstashAttribute); | ||
} else if (logstashAttribute.getAttributeName().equals(LOGSTASH_GROK_PATTERN_DEFINITIONS_ATTRIBUTE_NAME)) { | ||
patternDefinitions.putAll((Map<String, String>) logstashAttribute.getAttributeValue().getValue()); | ||
} else if (logstashAttribute.getAttributeName().equals(LOGSTASH_GROK_OVERWRITE_ATTRIBUTE_NAME)) { | ||
keysToOverwrite.addAll((List<String>) logstashAttribute.getAttributeValue().getValue()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is doing multiple things. Consider moving the responsibility of each if condition to separate class to make this class more maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean extract this into a method? Creating a new class for this would be confusing for plugin authors for adding converter logic. I can extract it to a separate method.
4e17bcb
to
b42bfb4
Compare
b42bfb4
to
0be236f
Compare
Matcher grokMatchPatternMatcher = GROK_MATCH_PATTERN.matcher(matchPattern); | ||
StringBuilder convertedGrokMatchPattern = new StringBuilder(); | ||
while(grokMatchPatternMatcher.find()) { | ||
convertedGrokMatchPattern.append(grokMatchPatternMatcher.group(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are magic numbers. What do they mean? We should define constants for them that describe what they mean. What is in group 1? Group 2? etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add group names for regex.
|
||
import java.util.Map; | ||
|
||
public interface GrokLogstashPluginAttributeMapperHelper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be package protected. I'd also be happy to see them as static inner classes because their scope of use is so small.
* @return A Map | ||
* @since 1.2 | ||
*/ | ||
List<String> getNestedSyntaxAttributeNames(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach has helped reduce complexity for the simpler scenarios such as kv
.
@@ -138,4 +139,22 @@ void mapAttributes_with_negation_expression_negates_boolean_value(boolean inputV | |||
assertThat(actualPluginModel.get(0).getPluginSettings().size(), equalTo(1)); | |||
assertThat(actualPluginModel.get(0).getPluginSettings().get(dataPrepperAttribute.substring(1)), equalTo(!inputValue)); | |||
} | |||
|
|||
@ParameterizedTest | |||
@CsvSource({"[outer_key][inner_key],/outer_key/inner_key", "[array][0][key], /array/0/key", "[message], /message"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
2cb6183
to
d64f16c
Compare
Matcher grokMatchPatternMatcher = GROK_MATCH_PATTERN.matcher(matchPattern); | ||
StringBuilder convertedGrokMatchPattern = new StringBuilder(); | ||
while(grokMatchPatternMatcher.find()) { | ||
convertedGrokMatchPattern.append(grokMatchPatternMatcher.group("openingBrackets")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to define these (openiingBrackets, grokSyntax, etc) as constants. That way there's only one place that they are defined. That way if someone changes the regex pattern it doesn't break the code below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback Phill. I made the changes to the logic to remove regex groups as it is getting complex.
final List<LogstashAttribute> matchAttributes = new ArrayList<>(); | ||
final Map<String, String> patternDefinitions = new HashMap<>(); | ||
GrokLogstashPluginAttributeMapperHelper matchAttributesHelper = new GrokMatchAttributeHelper(); | ||
GrokLogstashPluginAttributeMapperHelper patternDefinitionsAttributeHelper = new GrokPatternDefinitionsAttributeHelper(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to constructor.
|
||
import static org.opensearch.dataprepper.logstash.mapping.GrokLogstashPluginAttributesMapper.LOGSTASH_GROK_MATCH_ATTRIBUTE_NAME; | ||
|
||
class GrokMatchAttributeHelper implements GrokLogstashPluginAttributeMapperHelper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add documentation
return convertedGrokMatchPattern.toString(); | ||
} | ||
|
||
private static String getConvertedMatchPattern(String matchPatternGroup) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: final to method parameter and other variables in this class.
5729383
to
6b0690a
Compare
import java.util.regex.Matcher; | ||
import java.util.regex.Pattern; | ||
|
||
public class GrokMatchUtil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class also needs to be package protected.
Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
final Map<String, GrokLogstashPluginAttributeMapperHelper> helperInstances = new HashMap<>(); | ||
helperInstances.put(LOGSTASH_GROK_MATCH_ATTRIBUTE_NAME, matchAttributesHelper); | ||
helperInstances.put(LOGSTASH_GROK_PATTERN_DEFINITIONS_ATTRIBUTE_NAME, patternDefinitionsAttributeHelper); | ||
helperInstances.put(LOGSTASH_GROK_OVERWRITE_ATTRIBUTE_NAME, overwriteAttributeHelper); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be moved to constructor as well.
Define helperInstances as class instance variable and initialize in the constructor.
private final Map<String, GrokLogstashPluginAttributeMapperHelper> helperInstances
Signed-off-by: Asif Sohail Mohammed <[email protected]>
eda8dcc
to
52bdbcc
Compare
Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed [email protected]
Description
This PR only covers nested syntax for
Date
,Grok
(key in match, named captures) andDefaultLogstashPluginAttributesMapper
List of plugin configurations where nested syntax can be used in Logstash
Grok
Date
Mutate
KV
Integration tests will give a high level idea of where nested syntax can be used in different processors. Files for Inegration tests are
log-ingest-to-opensearch-nested-syntax.conf
andlog-ingest-to-opensearch-nested-syntax.expected.yaml
Issues Resolved
#627
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.