Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Speed up saving by automatically choosing the fastest method #1091

Closed
2 of 3 tasks
billdenney opened this issue Dec 4, 2019 · 2 comments
Closed
2 of 3 tasks
Assignees

Comments

@billdenney
Copy link
Contributor

Prework

  • Read and abide by drake's code of conduct.
  • Search for duplicates among the existing issues, both open and closed.
  • Advanced users: verify that the bottleneck still persists in the current development version (i.e. remotes::install_github("ropensci/drake")) and mention the SHA-1 hash of the Git commit you install.

Description

This is an offshoot of the conversation in #1089.

When saving results, choose the most efficient method to save the results. Since there is presumably a location within the target that indicates the format it was saved in, that seems like it could be automated to choose the best instead of the default. If the user specifies the format, that should be used, but if not specified, use the best known one.

The benefit would be that all users would get the best known format (to the drake developer) without having to track what that best format is.

Reproducible example

Please see the discussion in #1089 indicating that saving data.frames would be faster with "fst" format.

Benchmarks

Please see the discussion in #1089 indicating that saving data.frames would be faster with "fst" format.

@wlandau
Copy link
Member

wlandau commented Dec 4, 2019

Choosing the best format is possible to automate, but I would rather not go that route .

At some point, fst might not be the best format for data frames. If a new package (say, arrow) comes along and uniformly outperforms fst, then we would want to switch to a different way of saving data frames, and it gets messy if old workflows already have fst files saved.

Also, if drake strongly depends on all the packages that support formats (fst, data.table, keras, etc.) then the number of dependencies grows large really quickly. So all those packages are in Suggests in the DESCRIPTION, and we need behavior to be consistent depending on whether they are installed or not.

@wlandau wlandau closed this as completed Dec 4, 2019
@billdenney
Copy link
Contributor Author

Fair enough!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants