feat: avoid double fetches on ibis.duckdb.connect().read_csv("https://slow_url").cache()
#10845
Open
1 task done
Labels
feature
Features or general enhancements
Is your feature request related to a problem?
When you call
.read_csv()
on the duckdb backend, this makes duckdb actually go fetch [some] of the data in order to sniff the schema. Then, when you call.cache()
on the created view, it actually goes and fetches the full data.This is related to #9931.
What is the motivation behind your request?
I am working on relatively large tables on a slow internet connection. Each fetch takes about 30 seconds. I would like to avoid this double fetch.
Describe the solution you'd like
Since the result of
.read_csv()
needs to be a Table with a known schema, it is going to be required to fetch some data during that function call. So, I think we need to add an optional argument to the function, or create entirely new function. I would vote for adding params if we can come up with something sane. Maybecache: bool
?What version of ibis are you running?
main
What backend(s) are you using, if any?
duckdb
Code of Conduct
The text was updated successfully, but these errors were encountered: