Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Prepper Event Json Codec #4404

Closed
graytaylor0 opened this issue Apr 9, 2024 · 4 comments · Fixed by #4436
Closed

Data Prepper Event Json Codec #4404

graytaylor0 opened this issue Apr 9, 2024 · 4 comments · Fixed by #4436
Assignees
Labels
enhancement New feature or request

Comments

@graytaylor0
Copy link
Member

graytaylor0 commented Apr 9, 2024

Is your feature request related to a problem? Please describe.
A Data Prepper Event currently includes the Event data (which is the user's data), but Data Prepper Events also have other attributes, for example the Event Metadata or Event tags, and this could grow in the future.

Describe the solution you'd like
A standard codec that can be used to represent a full Data Prepper Event, including the metadata and tags. I am proposing

codec:
   event_json:

as the identifier for this codec.

As an input codec for use in sources like S3, the event_json would be read in as a Data Prepper Event, where the data is written to the Event data, and the metadata to the Event Metadata and tags are written to the Event tags.

As an output codec, the structure of the json representing the full Event would contain something like the following format

{
   "event_data": {  EVENT_DATA_JSON },
   "event_metadata": { EVENT_METADATA_JSON },
   "event_tags": [ "EVENT_TAGS" ]
}

Describe alternatives you've considered (Optional)
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@dlvenable
Copy link
Member

@graytaylor0 , Do you expect for the following to be JSON or serialized JSON strings?

  • EVENT_DATA_JSON
  • EVENT_METADATA_JSON

Also, why not just use data, metadata, and tags? Is there a particular reason for the event_ prefix?

Tags are part of the event metadata. Why have them at the top-level when they could be expressed in the metadata?

@graytaylor0
Copy link
Member Author

graytaylor0 commented Apr 9, 2024

Do you expect for the following to be JSON or serialized JSON strings?

* EVENT_DATA_JSON

* EVENT_METADATA_JSON

I expect these to be JSON, not serialized JSON strings

Also, why not just use data, metadata, and tags? Is there a particular reason for the event_ prefix? Tags are part of the event metadata. Why have them at the top-level when they could be expressed in the metadata?

No particular reason to prefix with event_. I am fine with just using data and metadata.

And yes we could consolidate all the metadata into metadata. So it would look something like

{
  "data": { JSON_DATA },
  "metadata": { 
      "attributes": { ATTRIBUTES_JSON_MAP },
      "tags": [ TAGS ],
      "timeReceived": timestamp
      "externalOriginationTime": timestamp
  }
}

We should be able to use Jackson to serialize and deserialize the Event metadata here as well. One thought though is should the timeReceived be preserved by the InputCodec, or should the timeReceived be overridden when the Event is created again from the source codec?

@dlvenable
Copy link
Member

I like that newer model.

One thought though is should the timeReceived be preserved by the InputCodec, or should the timeReceived be overridden when the Event is created again from the source codec?

I think we should make the timeReceived the time that Data Prepper re-reads it. So the InputCodec should override it.

We could possibly add a new attribute to hold the original time received if we wanted.

@kkondaka
Copy link
Collaborator

@graytaylor0 Is this really needed? We already allow including tags in the data. And entries in metadata can be moved to data using add_entries processor selectively. We may have some data prepper internal stuff stored in tags or metadata. Users might not want them to be included. What's the motivation behind this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

Successfully merging a pull request may close this issue.

3 participants