-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
snowflake: improve performance on write #760
Conversation
Do you think this should also be done for the dbExistsTable ends up having the same problem when listing tables from all DATABASE / SCHEMA instead of the one that will be used by dbWriteTable to check if the table already exists in the target database? Line 58 in 85327fe
Line 277 in 85327fe
Line 369 in 85327fe
Line 44 in 85327fe
|
Thanks @meztez I thought about this - but i think we need to be a bit careful. We should probably interpret The method has an ellipsis; one approach is to add a |
@detule Totally makes sense. The issue is only the found part of the dbWriteTable method. In any other use case, your response is on point. Not sure the best way to do the found check any faster. |
@detule hmmmm, my gut feeling would have been that it only searches the current catalog/schema. We should try and figure out the underlying principle here. |
Hey @hadley - copy. Though as currently implemented it searches everywhere. I wonder then, if there is some ambiguity in terms of how to interpret it, if it makes sense to side with "current precedent" so as to not break any code folks might have out there in the wild. This is further, I think, muddled by some anecdotal evidence - for example with SQL Server when you create a temporary table it is not in your current catalog. Rather, it is housed in With this said, I wonder if it makes sense to merge the optimization in this PR since according to #759 it goes a way towards improving performance. Happy to tackle |
Let me know if I missed the mark on this one. It allows us to, down the line and if needed, also change the default for |
@@ -1,5 +1,7 @@ | |||
# odbc (development version) | |||
|
|||
* Snowflake: Improved performance on write (#760). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By quite a bit!🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Simon!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With regards to the SELECT *
, I was thinking of that more of a general principle to guide whether or not a table exists, rather than an implementation idea.
It really feels like, particularly for snowflake and databricks, that assuming any catalog/schema makes everything really slow, and I'm not convinced that that makes sense as a default, since it doesn't correspond to the mental model you will have built up writing SQL queries. Does that make sense?
R/driver-snowflake.R
Outdated
|
||
#' @rdname Snowflake | ||
setMethod("dbExistsTableForWrite", c("Snowflake", "character"), | ||
function(conn, name, ...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think schema and catalog should be the explicit arguments here since you're modifying them, and I'm pretty sure that should allow you to drop the do.call
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - that's also a good thought!
But unless you feel strongly about this, I would err on the side of leaving the signature of dbExistsTableForWrite
the same as dbExistsTable
- at least for now, with the current default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can leave the signature of the generic the same, but it's ok for the signature of method to be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - let me know if that's not what you had in mind.
I think I am with you, and in particular when writing. We should assume that the identifier passed is within the scope of the I plan on following up for |
Ah yeah, I keep forgetting about temporary tables, and how they live in different places in different databases. I think that's a nice place where the |
|
||
#' @rdname Snowflake | ||
setMethod("dbExistsTableForWrite", c("Snowflake", "character"), | ||
function(conn, name, ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this looks good!
Snowflake: use current DB/schema, if none are provided, in
odbcConnectionColumns
and when checking for existence indbWriteTable
.Fixes #759.