-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Dots Discovered Key Names #4977
Comments
Thanks for reporting this issue. This is actually a conflict between different field types in OpenSearch. During indexing, the document is rejected because of it. The issue arises, because OpenSearch interprets dots "." in field names as nested JSON objects. Let me take your sample data and reduce it a little to illustrate the issue. Let's say, we want to index just the following document in OpenSearch: {
"labels": {
"app": "fooservice",
"app.kubernetes.io/component": "foo"
}
} OpenSearch expands the key {
"labels": {
"app": "fooservice",
"app": { // Error, is "app" a string or an object?
"kubernetes": {
"io/component": "foo"
}
}
}
} This issue happens a lot, when logging K8s labels or annotations. It would also occur, if Fluent Bit wrote to OpenSearch directly and is not a bug in DataPrepper per se. You can work around this issue, by replacing the dots "." with underscores "_" using a small Lua script in Fluent Bit. We have developed this snippet for our own use-cases. Such a transformation is usually known by the name dedotting in case you want to google it. Data Prepper faces a similar issue for OpenTelemetry attributes. Here its processors dedot the attribute names by replacing certain dots "." by "@". In that case, the dedotting is hard-coded into the OpenTelemetry processors of Data Prepper. I am not that experienced with the generic Data Prepper processors, to give an example using those. The main problem to me is, that you would not want to list all field names, that should be dedotted in the pipeline configuration. In your example, it could be applied to all fields under Note, that any dedotting procedure increases the divide between deployment and observability due to the altered names. Unfortunately, there is no easy way around this. The unfolding of dotted names is a major feature of OpenSearch. |
Thanks for the lead. For whatever reason doing this fixed it? All the labels and timestamp will still show up in OpenSearch. So it is somewhat puzzling.
|
@KarstenSchnitter , Thank you for the detailed comment. Do you think having a @Conklin-Spencer-bah , I think deleting
This is also why you needed to delete |
Somewhat relatedly, we are working on dynamic key renaming in #4849. The approach in there is to support renaming keys by pattern. Still, dedotting seems a common enough pattern to possibly warrant its own processor. |
Describe the bug
Keys with "." in them are not able to be processed.
When ingesting logs from FluentBit -> S3 -> SQS -> Data Prepper / OSIS -> OpenSearch any key that has a dot "." in it is throwing an error on ingestion, see below error from OSIS. I believe this is because the Kubernetes metadata in labels contains dots.
The JSON blob looks as such
If these labels aren't in the log ingestion succeeds. One challenge is that the labels vary from service to service so predicting what they will be is difficult. It would be preferable if there was a way to say "If the key found has a "." (or some other char) substitute it with "_" or whatever the user chooses.
It is possible that this is able to be done and I am unaware on how to do so.
To Reproduce
Attempt to process and ingest a log file to OpenSearch with Data Prepper with a log that has Keys that contain dots "."
Such as:
Expected behavior
The key in double quotes is processed as a key even when dots are present.
Environment (please complete the following information):
Additional context
Seems this is related and was merged with a Fix. But it is unclear on how to resolve this issue.
#450
The text was updated successfully, but these errors were encountered: