Feature Request: Speed up saving by automatically choosing the fastest method #1091

billdenney · 2019-12-04T21:05:14Z

Prework

Read and abide by drake's code of conduct.
Search for duplicates among the existing issues, both open and closed.
Advanced users: verify that the bottleneck still persists in the current development version (i.e. remotes::install_github("ropensci/drake")) and mention the SHA-1 hash of the Git commit you install.

Description

This is an offshoot of the conversation in #1089.

When saving results, choose the most efficient method to save the results. Since there is presumably a location within the target that indicates the format it was saved in, that seems like it could be automated to choose the best instead of the default. If the user specifies the format, that should be used, but if not specified, use the best known one.

The benefit would be that all users would get the best known format (to the drake developer) without having to track what that best format is.

Reproducible example

Please see the discussion in #1089 indicating that saving data.frames would be faster with "fst" format.

Benchmarks

Please see the discussion in #1089 indicating that saving data.frames would be faster with "fst" format.

The text was updated successfully, but these errors were encountered:

wlandau · 2019-12-04T21:32:53Z

Choosing the best format is possible to automate, but I would rather not go that route .

At some point, fst might not be the best format for data frames. If a new package (say, arrow) comes along and uniformly outperforms fst, then we would want to switch to a different way of saving data frames, and it gets messy if old workflows already have fst files saved.

Also, if drake strongly depends on all the packages that support formats (fst, data.table, keras, etc.) then the number of dependencies grows large really quickly. So all those packages are in Suggests in the DESCRIPTION, and we need behavior to be consistent depending on whether they are installed or not.

billdenney · 2019-12-04T21:34:49Z

Fair enough!

billdenney added the topic: performance label Dec 4, 2019

billdenney assigned wlandau Dec 4, 2019

wlandau closed this as completed Dec 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Speed up saving by automatically choosing the fastest method #1091

Feature Request: Speed up saving by automatically choosing the fastest method #1091

billdenney commented Dec 4, 2019

wlandau commented Dec 4, 2019

billdenney commented Dec 4, 2019

Feature Request: Speed up saving by automatically choosing the fastest method #1091

Feature Request: Speed up saving by automatically choosing the fastest method #1091

Comments

billdenney commented Dec 4, 2019

Prework

Description

Reproducible example

Benchmarks

wlandau commented Dec 4, 2019

billdenney commented Dec 4, 2019