-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: use native execs instead of custom execs #1262
Conversation
7e0a6c5
to
70bc689
Compare
@scsmithr i think this is mostly ready for review. I still have a bit of cleanup to do & add support for s3, but the pattern seems to work. I tested via select * from csv_scan('<LOCAL FILE>');
select * from csv_scan('<HTTP URL>');
select * from csv_scan('<gcs bucket>', '<CREDS>');
create external table test gcs OPTIONS ...;
select * from test;
select * from parquet_scan('<LOCAL FILE>');
select * from parquet_scan('<HTTP URL>');
select * from parquet_scan('<gcs bucket>', '<CREDS>'); they are all using the native datafusion execs. |
so now since they are using the native execs, the serialized Exec should have all of the information needed for a remote server to deserialize it & create the appropriate object store. Previously, it didn't contain any information about the url & objectstore. |
We're likely going to need to repopulate the object stores here too: https://github.com/glaredb/glaredb/blob/2141e562f0cc80b6d4f01566a6c11efd55a41f86/crates/sqlexec/src/context.rs#L560-L568 This is when some other node updates the catalog, and the node that the session is currently running on picks up the newer catalog. |
For us to be able to serialize the plans via protobuf, it makes more sense to use the native execs when possible.
This PR refactors our current custom file execs to use the already existing datafusion execs.
the tricky part with this is that we now have to manage an object store registry that maps to/from urls.