-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic branching #1042
Dynamic branching #1042
Conversation
Based on the global seed, a user-supplied target seed, and the subtarget name.
Codecov Report
@@ Coverage Diff @@
## master #1042 +/- ##
=======================================
Coverage 100% 100%
=======================================
Files 47 47
Lines 6737 7161 +424
=======================================
+ Hits 6737 7161 +424
Continue to review full report at Codecov.
|
More details and discussion: #685 (comment) |
Forgot to mention: for anything without a library(drake)
library(tibble)
plan <- drake_plan(
x = as_tibble(mtcars),
y = target(x, dynamic = map(x))
)
make(plan, verbose = 0L)
readd(y, subtargets = 1:2)
#> [[1]]
#> # A tibble: 1 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#>
#> [[2]]
#> # A tibble: 1 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.88 17.0 0 1 4 4 Created on 2019-11-02 by the reprex package (v0.3.0) |
I anticipate that this will cause confusion in the case of data.frames. It might be better to have verbose control over which way the iteration goes. This is especially true for Consider making the map dimension variable with the default being what purrr would do? |
This looks really cool! How would I get the subtarget associated with a particular value of country here?
|
Re #1042 (comment), I will think about it, but please do not get your hopes up. Row-oriented workflows come up a lot more than looping over columns, so I want the former to take priority. And other workarounds like Anyway, here is how to get the model for a country and the aggregate for a continent. There is no general way to assign suggestive names during library(broom)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(drake)
library(gapminder)
fit_model <- function(dataset, country) {
dataset %>%
filter(country == !!country) %>%
lm(formula = gdpPercap ~ year) %>%
tidy() %>%
mutate(country = !!country)
}
plan <- drake_plan(
dataset = gapminder,
country = unique(dataset$country),
model = target(
fit_model(dataset, country),
dynamic = map(country)
),
continent = dataset %>%
distinct(continent, country) %>%
pull(continent),
results = target(
bind_rows(model) %>%
mutate(continent = continent),
dynamic = combine(model, .by = continent)
)
)
make(plan, verbose = 0L)
countries <- readd(country)
index <- which(countries == "Philippines")
readd(model, subtargets = index)[[1]]
#> # A tibble: 2 x 6
#> term estimate std.error statistic p.value country
#> <chr> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 (Intercept) -53730. 6773. -7.93 0.0000127 Philippines
#> 2 year 28.2 3.42 8.25 0.00000895 Philippines
continents <- unique(readd(continent))
index <- which(continents == "Oceania")
readd(results, subtargets = index)
#> [[1]]
#> # A tibble: 4 x 7
#> term estimate std.error statistic p.value country continent
#> <chr> <dbl> <dbl> <dbl> <dbl> <fct> <fct>
#> 1 (Intercep… -824973. 47418. -17.4 8.34e-9 Australia Oceania
#> 2 year 427. 24.0 17.8 6.61e-9 Australia Oceania
#> 3 (Intercep… -459336. 26886. -17.1 9.96e-9 New Zeala… Oceania
#> 4 year 241. 13.6 17.7 6.96e-9 New Zeala… Oceania Created on 2019-11-03 by the reprex package (v0.3.0) |
FYI, I decided to extend dynamic library(broom)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(drake)
library(gapminder)
fit_model <- function(dataset, country) {
dataset %>%
lm(formula = gdpPercap ~ year) %>%
tidy() %>%
mutate(country = !!country)
}
plan <- drake_plan(
dataset = gapminder,
country = dataset$country,
model = target(
fit_model(dataset, country),
dynamic = combine(dataset, .by = country)
)
)
make(plan, verbose = 0L)
readd(model, subtargets = 1)
#> [[1]]
#> # A tibble: 2 x 6
#> term estimate std.error statistic p.value country
#> <chr> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 (Intercept) 1675. 3747. 0.447 0.664 Afghanistan
#> 2 year -0.441 1.89 -0.233 0.821 Afghanistan Created on 2019-11-03 by the reprex package (v0.3.0) |
New chapter in the manual: https://ropenscilabs.github.io/drake-manual/dynamic.html |
Thanks wlandau you are killing it. Dynamic branching is here! This is a big improvement for large plan. Now we have two choices of static and dynamic branching. One thing now I am confuse that, as mentioned in the new chapter of manual, with all the benefits of dynamic branching, from performance improvement of
What are the pros and cons of static and dynamic branching and how do we choose between them? Can dynamic branching be a drop-in replacement for static branching in most cases? |
With dynamic branching, In fact, you can even use static and dynamic branching together! If you do, I recommend static library(drake)
plan <- drake_plan(
locations = get_location_db(), # many rows
coordinates = target(
my_geocode_fn(locations, method),
dynamic = map(locations),
transform = map(method = c("address", "postal", "boundary"))
),
results = target(
bind_rows(coordinates),
dynamic = combine(coordinates),
transform = combine(coordinates)
)
)
plot(plan) Created on 2019-11-04 by the reprex package (v0.3.0) |
And thanks for the question, @mik3y64. I think this is a good one for the manual. |
Summary
At long last,
drake
is capable of dynamic branching! You can now define new targets whilemake()
is running.Here is an demo that fits GDP vs year for each country in the Gapminder dataset. We let the data decide how many targets we have and what those targets mean.
Created on 2019-11-03 by the reprex package (v0.3.0)
Performance
Dynamic branching makes
drake_config()
andoutdated()
much faster, which makesmake()
much faster to initialize. However, when it comes to building targets, the per-target overhead for dynamic sub-targets can be twice that of static branching. Not such a big deal for a first implementation. After all, it took a long time to get static branching as fast as it is. But performance will always be a long-term project.Related GitHub issues and pull requests
Checklist
drake
's code of conduct.testthat
unit tests totests/testthat
for any new functionality.