Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialized formats for big targets #977

Merged
merged 28 commits into from
Aug 5, 2019
Merged

Specialized formats for big targets #977

merged 28 commits into from
Aug 5, 2019

Conversation

wlandau
Copy link
Member

@wlandau wlandau commented Aug 5, 2019

Summary

This PR implements specialized data storage and retrieval. By working around storr, we decrease memory consumption, and we increase speed for data frames and Keras models. Formats "fst", "keras", and "rds" are currently supported, and it will be easy to add more.

This is an alternative to richfitz/storr#111 for specialized scenarios, with greater efficiency in those scenarios (ref: #907 (comment)). cc @nettoyoussef, @richfitz, @MarcusKlik. @kendonB and @cimentadaj, you might also be interested.

How it works

Select the format in target() or by manually appending a "format" column to the plan.

library(drake)
n <- 1e8 # Each target is 1.6 GB in memory.
plan <- drake_plan(
  data_fst = target(
    data.frame(x = runif(n), y = runif(n)),
    format = "fst"
  ),
  data_old = data.frame(x = runif(n), y = runif(n))
)
make(plan)
#> target data_fst
#> target data_old
build_times(type = "build")
#> # A tibble: 2 x 4
#>   target   elapsed              user                 system    
#>   <chr>    <Duration>           <Duration>           <Duration>
#> 1 data_fst 13.93s               37.562s              7.954s    
#> 2 data_old 184s (~3.07 minutes) 177s (~2.95 minutes) 4.157s

Created on 2019-08-05 by the reprex package (v0.3.0)

Implementation

drake now uses a decorated storr internally, which is just an ordinary storr wrapped up in a reference class.

Related GitHub issues and pull requests

Checklist

@wlandau wlandau self-assigned this Aug 5, 2019
@wlandau
Copy link
Member Author

wlandau commented Aug 5, 2019

Also cc @schnorr and @guilhermealles. target(format = "fst") could really help if you are still interested in integrating drake into https://github.com/schnorr/starvz.

@codecov-io
Copy link

codecov-io commented Aug 5, 2019

Codecov Report

Merging #977 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #977    +/-   ##
======================================
  Coverage     100%   100%            
======================================
  Files          45     46     +1     
  Lines        6900   6323   -577     
======================================
- Hits         6900   6323   -577
Impacted Files Coverage Δ
R/r_make.R 100% <ø> (ø) ⬆️
R/sankey_drake_graph.R 100% <ø> (ø) ⬆️
R/clean.R 100% <100%> (ø) ⬆️
R/deprecated.R 100% <100%> (ø) ⬆️
R/drake_meta_.R 100% <100%> (ø) ⬆️
R/drake_config.R 100% <100%> (ø) ⬆️
R/decorate_storr.R 100% <100%> (ø)
R/cache.R 100% <100%> (ø) ⬆️
R/drake_history.R 100% <100%> (ø) ⬆️
R/drake_plan.R 100% <100%> (ø) ⬆️
... and 48 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ffe4a7e...49af834. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants