snowflake: improve performance on write #760

detule · 2024-02-14T03:59:41Z

Snowflake: use current DB/schema, if none are provided, in odbcConnectionColumns and when checking for existence in dbWriteTable.

Fixes #759.

meztez · 2024-02-19T22:07:13Z

Do you think this should also be done for the odbcConnectionTables method?

dbExistsTable ends up having the same problem when listing tables from all DATABASE / SCHEMA instead of the one that will be used by dbWriteTable to check if the table already exists in the target database?

odbc/R/dbi-table.R

Line 58 in 85327fe

found <- dbExistsTable(conn, name)

odbc/R/dbi-table.R

Line 277 in 85327fe

setMethod("dbExistsTable", c("OdbcConnection", "character"),

odbc/R/odbc-connection.R

Line 369 in 85327fe

setMethod("odbcConnectionTables", c("OdbcConnection", "character"),

odbc/R/RcppExports.R

Line 44 in 85327fe

    
           connection_sql_tables <- function(p, catalog_name = NULL, schema_name = NULL, table_name = NULL, table_type = NULL) {

detule · 2024-02-19T22:29:00Z

Thanks @meztez

I thought about this - but i think we need to be a bit careful. We should probably interpret dbExistsTable(conn, "tblname") to mean "does tblname exist anywhere", not just in the current db/schema.

The method has an ellipsis; one approach is to add a package::odbc specific argument - something like useCurrentSchemaIfUnspecified that defaults to FALSE. We would set it to TRUE in dbWriteTable. What do you think?

meztez · 2024-02-19T22:41:50Z

@detule Totally makes sense. The issue is only the found part of the dbWriteTable method. In any other use case, your response is on point. Not sure the best way to do the found check any faster.

hadley · 2024-02-20T13:34:41Z

@detule hmmmm, my gut feeling would have been that it only searches the current catalog/schema. We should try and figure out the underlying principle here.

detule · 2024-02-20T14:29:51Z

Hey @hadley - copy. Though as currently implemented it searches everywhere. I wonder then, if there is some ambiguity in terms of how to interpret it, if it makes sense to side with "current precedent" so as to not break any code folks might have out there in the wild.

This is further, I think, muddled by some anecdotal evidence - for example with SQL Server when you create a temporary table it is not in your current catalog. Rather, it is housed in tempdb.dbo. So - if trying to write to a temp table and only checking for existence in the current catalog, you might never find it.

With this said, I wonder if it makes sense to merge the optimization in this PR since according to #759 it goes a way towards improving performance. Happy to tackle dbExistsTable as a follow up.

…owflake

detule · 2024-02-28T09:50:08Z

@hadley

Let me know if I missed the mark on this one.

It allows us to, down the line and if needed, also change the default for dbExistsTableForWrite to something like what you suggested - checking for output of SELECT *.

R/driver-snowflake.R

R/odbc-connection.R

R/driver-snowflake.R

simonpcouch · 2024-02-28T14:20:11Z

NEWS.md

@@ -1,5 +1,7 @@
 # odbc (development version)

+* Snowflake: Improved performance on write (#760).


By quite a bit!🎉

Thanks Simon!

hadley

With regards to the SELECT *, I was thinking of that more of a general principle to guide whether or not a table exists, rather than an implementation idea.

It really feels like, particularly for snowflake and databricks, that assuming any catalog/schema makes everything really slow, and I'm not convinced that that makes sense as a default, since it doesn't correspond to the mental model you will have built up writing SQL queries. Does that make sense?

R/driver-snowflake.R

hadley · 2024-02-29T13:51:10Z

R/driver-snowflake.R

+
+#' @rdname Snowflake
+setMethod("dbExistsTableForWrite", c("Snowflake", "character"),
+  function(conn, name, ...) {


I think schema and catalog should be the explicit arguments here since you're modifying them, and I'm pretty sure that should allow you to drop the do.call.

Hey - that's also a good thought!

But unless you feel strongly about this, I would err on the side of leaving the signature of dbExistsTableForWrite the same as dbExistsTable - at least for now, with the current default.

I think we can leave the signature of the generic the same, but it's ok for the signature of method to be different?

Done - let me know if that's not what you had in mind.

R/odbc-connection.R

detule · 2024-03-02T15:19:18Z

With regards to the SELECT *, I was thinking of that more of a general principle to guide whether or not a table exists, rather than an implementation idea.

It really feels like, particularly for snowflake and databricks, that assuming any catalog/schema makes everything really slow, and I'm not convinced that that makes sense as a default, since it doesn't correspond to the mental model you will have built up writing SQL queries. Does that make sense?

I think I am with you, and in particular when writing. We should assume that the identifier passed is within the scope of the CURRENT catalog/schema, or a temporary table. There is some nuance with temporary tables. I think we are safe with what we have done here with Snowflake since temporary tables are always created in the current catalog/schema. But for other back-ends this may not be the case.

I plan on following up for databricks - i am not even sure if we are creating temporary tables correctly there when using dbWriteTable( ..., temporary = TRUE ).

hadley · 2024-03-05T14:43:35Z

Ah yeah, I keep forgetting about temporary tables, and how they live in different places in different databases. I think that's a nice place where the SELECT * idea works (since presumably if you create CREATE TEMPORARY TABLE foo then SELECT * FROM foo will work), but it may be hard to translate that idea into code that uses the odbc interface.

hadley · 2024-03-07T21:38:41Z

R/driver-snowflake.R

+
+#' @rdname Snowflake
+setMethod("dbExistsTableForWrite", c("Snowflake", "character"),
+  function(conn, name, ...,


Yeah, this looks good!

snowflake: improve performance on write

5d06c7b

detule mentioned this pull request Feb 14, 2024

Default to arbitrary database if not specified when creating temp tables on a warehouse (Snowflake / Databricks) #759

Closed

snowflake: columns: pass ellipsis

ee30db4

detule added 2 commits February 27, 2024 02:38

snowflake: Allow back-ends to optimize dbExistsTable and do so for sn…

5b431c3

…owflake

docs, news: update

25f96c4

detule requested review from hadley and simonpcouch February 28, 2024 09:31

docs: fixup

a458c4c

simonpcouch reviewed Feb 28, 2024

View reviewed changes

detule added 2 commits February 29, 2024 11:12

code-review: doc fixup

b4137ee

code-review: syntax cleanup

0e135ae

hadley approved these changes Feb 29, 2024

View reviewed changes

code-review: refactor

68ab29b

detule added 3 commits March 2, 2024 15:24

code-review: cleanup roxygen tags

5ffbd81

code-review: moar cleanup roxygen tags

991260c

dbListFields: Implement for Id, SQL

0c72d9b

code-review: update method signature

67137fc

hadley approved these changes Mar 7, 2024

View reviewed changes

Merge branch 'main' into snowflake/write_perf

83793b4

detule merged commit 3bca21d into r-dbi:main Mar 15, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

snowflake: improve performance on write #760

snowflake: improve performance on write #760

detule commented Feb 14, 2024 •

edited

Loading

meztez commented Feb 19, 2024 •

edited

Loading

detule commented Feb 19, 2024

meztez commented Feb 19, 2024 •

edited

Loading

hadley commented Feb 20, 2024

detule commented Feb 20, 2024

detule commented Feb 28, 2024

simonpcouch Feb 28, 2024

detule Feb 29, 2024

hadley left a comment

hadley Feb 29, 2024

detule Mar 2, 2024

hadley Mar 5, 2024

detule Mar 6, 2024

detule commented Mar 2, 2024

hadley commented Mar 5, 2024

hadley Mar 7, 2024

		@@ -1,5 +1,7 @@
		# odbc (development version)

		* Snowflake: Improved performance on write (#760).

snowflake: improve performance on write #760

snowflake: improve performance on write #760

Conversation

detule commented Feb 14, 2024 • edited Loading

meztez commented Feb 19, 2024 • edited Loading

detule commented Feb 19, 2024

meztez commented Feb 19, 2024 • edited Loading

hadley commented Feb 20, 2024

detule commented Feb 20, 2024

detule commented Feb 28, 2024

simonpcouch Feb 28, 2024

Choose a reason for hiding this comment

detule Feb 29, 2024

Choose a reason for hiding this comment

hadley left a comment

Choose a reason for hiding this comment

hadley Feb 29, 2024

Choose a reason for hiding this comment

detule Mar 2, 2024

Choose a reason for hiding this comment

hadley Mar 5, 2024

Choose a reason for hiding this comment

detule Mar 6, 2024

Choose a reason for hiding this comment

detule commented Mar 2, 2024

hadley commented Mar 5, 2024

hadley Mar 7, 2024

Choose a reason for hiding this comment

detule commented Feb 14, 2024 •

edited

Loading

meztez commented Feb 19, 2024 •

edited

Loading

meztez commented Feb 19, 2024 •

edited

Loading