-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add StreamProvider
for configuring StreamTable
#10600
Add StreamProvider
for configuring StreamTable
#10600
Conversation
@metesynnada @mustafasrepo i believe you were both involved in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @matthewmturner -- this looks pretty neat.
I wonder if there is some example we could add to datafusion-examples
that would show this new API in action? That might help motivate what the API changes are for / help this PR review along as well as help others see the feature
@@ -103,19 +105,29 @@ impl FromStr for StreamEncoding { | |||
} | |||
} | |||
|
|||
/// The configuration for a [`StreamTable`] | |||
pub trait StreamSource: std::fmt::Debug + Send + Sync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps some documentation about what this trait is meant to represent would help
@alamb i would be happy to add example - for this PR it would likely just be copying from https://github.com/apache/datafusion/blob/main/datafusion/core/tests/fifo.rs. I will get to that shortly. Ultimately the motivation here is to start working towards a more generic interface where |
Im hoping to get to a similar API as
Where In the case of streams i had in mind:
Where there are different |
@alamb i have a working example now. i have idea to update it to show more of the streaming nature (i.e. write to the fifo and get batches multiple times) but wont have time for that today. Do you have thoughts in general on whether this type of API could be supported? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this API looks pretty good @matthewmturner thank you
@metesynnada or @ozankabak or @devinjdangelo do you have any thoughts / suggestions on these patterns?
@berkaysynnada PTAL |
I will review it in detail tomorrow |
/// The StreamProvider trait is used as a generic interface for reading and writing from streaming | ||
/// data sources (such as FIFO, Websocket, Kafka, etc.). Implementations of the provider are | ||
/// responsible for providing a `RecordBatchReader` and optionally a `RecordBatchWriter`. | ||
pub trait StreamProvider: std::fmt::Debug + Send + Sync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StreamFormat
might be a better name here, to align with FileFormat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure people associate location with 'Format'. FileFormat does not deal with location. I think StreamProvider is descriptive enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i viewed location as specific to the FileStreamProvider
and Format
as a concept that provided reading and writing capabilities, similar to FileFormat
. But i dont feel strongly either way so im fine to leave at StreamProvider
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After addressing @alamb's feedback, this PR looks good and separates responsibilities with an additional layer. However, FileStreamProvider
currently handles only the CSV format. I suggest focusing more on file format abstraction. Thanks for the example; it makes understanding and usage easier.
/// The StreamProvider trait is used as a generic interface for reading and writing from streaming | ||
/// data sources (such as FIFO, Websocket, Kafka, etc.). Implementations of the provider are | ||
/// responsible for providing a `RecordBatchReader` and optionally a `RecordBatchWriter`. | ||
pub trait StreamProvider: std::fmt::Debug + Send + Sync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure people associate location with 'Format'. FileFormat does not deal with location. I think StreamProvider is descriptive enough.
StreamProvider
for configuring StreamTable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good to me. Are we waiting on anything else? Otherwise I think it is good to go from my perspective.
Thank you @matthewmturner and @berkaysynnada for your code and review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well
thank you @alamb @berkaysynnada @ozankabak |
Thanks again everyone |
* Start setting up new StreamTable config * Cleanup * Cleanup * Fix some tests * Cleanup * Start adding example * Feedback
Which issue does this PR close?
Closes #10599
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Not yet, but I will add tests if there is agreement the API is going in the right direction
Are there any user-facing changes?
Yes, new interface for
StreamTable