-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Delta lake data source (initial implementation) #1119
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good — just a little comment.
Should add the EXTERNAL TABLE
support as well.
You also mentioned connecting without "catalog". How does that work? A catalog is just a database, right?
let _resp = client | ||
.get(format!("{}/api/2.1/unity-catalog/catalogs", workspace_url)) | ||
.send() | ||
.await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should check the response status code?
I opted to not do that in this PR since I'm now sure how we want to handle this. Delta files are able to be accessed without a catalog (since they're just files in object storage). If we added EXTERNAL TABLE support, what would we do? Require that a catalog is provided, or just have the user specify the path to the file? It's not clear to me what the best solution is here. I figured we'll learn more while we continue to flesh this out.
The catalog in the case of deltalake just provides us the location of objects in some object store. The catalog isn't needed if the location of an object is known ahead of time, since all that's needed to read/modify a delta file is self-contained. For example, the databricks deployment I have set up on AWS stores delta files in s3. Making the GET request for one of the tables will just return the table location in s3. I then use the credentials provided when actually access those objects in s3. |
Adds delta tables/deltalake as a data source using delta-rs.
What this looks like:
And querying works just like the other data sources:
Current status
Suffice to say there's a lot missing, and things will change. I want to get a general framework in to build off of for this.