Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: use native execs instead of custom execs #1262

Merged
merged 8 commits into from
Jul 7, 2023

Conversation

universalmind303
Copy link
Contributor

For us to be able to serialize the plans via protobuf, it makes more sense to use the native execs when possible.

This PR refactors our current custom file execs to use the already existing datafusion execs.

the tricky part with this is that we now have to manage an object store registry that maps to/from urls.

@universalmind303 universalmind303 force-pushed the universalmind303/exec-refactor branch from 7e0a6c5 to 70bc689 Compare July 7, 2023 17:47
@universalmind303
Copy link
Contributor Author

@scsmithr i think this is mostly ready for review. I still have a bit of cleanup to do & add support for s3, but the pattern seems to work.

I tested via

select * from csv_scan('<LOCAL FILE>');
select * from csv_scan('<HTTP URL>');
select * from csv_scan('<gcs bucket>', '<CREDS>');
create external table test gcs OPTIONS ...;
select * from test;
select * from parquet_scan('<LOCAL FILE>');
select * from parquet_scan('<HTTP URL>');
select * from parquet_scan('<gcs bucket>', '<CREDS>');

they are all using the native datafusion execs.

@universalmind303
Copy link
Contributor Author

so now since they are using the native execs, the serialized Exec should have all of the information needed for a remote server to deserialize it & create the appropriate object store. Previously, it didn't contain any information about the url & objectstore.

@scsmithr
Copy link
Member

scsmithr commented Jul 7, 2023

We're likely going to need to repopulate the object stores here too: https://github.com/glaredb/glaredb/blob/2141e562f0cc80b6d4f01566a6c11efd55a41f86/crates/sqlexec/src/context.rs#L560-L568

This is when some other node updates the catalog, and the node that the session is currently running on picks up the newer catalog.

@universalmind303 universalmind303 marked this pull request as ready for review July 7, 2023 21:55
@universalmind303 universalmind303 requested a review from scsmithr July 7, 2023 21:55
@universalmind303 universalmind303 merged commit 17996a9 into main Jul 7, 2023
@universalmind303 universalmind303 deleted the universalmind303/exec-refactor branch July 7, 2023 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants