Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing dependencies in drake manual: main example #579

Closed
namarks opened this issue Nov 10, 2018 · 2 comments
Closed

missing dependencies in drake manual: main example #579

namarks opened this issue Nov 10, 2018 · 2 comments

Comments

@namarks
Copy link

namarks commented Nov 10, 2018

I'm new to drake and have replicated the "main" example from the drake manual (https://github.com/ropenscilabs/drake-manual) by downloading the files with drake_example("main"). The plan runs, but I noticed when looking at the vis_drake_graph() output that some of the dependencies, such as Petal Width, are missing (below, in pink). Many of the them don't even show up in the manual version of the chart.

Why is there this discrepancy and why does the plan still run even though there are missing values?

Thanks!

# Load our packages and supporting functions into our session.
source(file.path("R", "setup.R"))

# Create the `drake` plan that outlines the work we are going to do.
source(file.path("R", "plan.R"))

# Run your work with make().
make(plan)

#Create visualization 
drake_config(plan) %>% vis_drake_graph()

screen shot 2018-11-09 at 9 52 35 pm

@wlandau
Copy link
Member

wlandau commented Nov 10, 2018

Hmmm... I thought I fixed that in drake 6.0.0. What version of are you using?

library(drake)
packageVersion("drake")
#> [1] '6.1.0.9000'
load_main_example() # New in 6.1.0
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
make(plan)
#> target raw_data
#> target data
#> target fit
#> target hist
#> target report
config <- drake_config(plan)
vis_drake_graph(config)

Created on 2018-11-10 by the reprex package (v0.2.1)

As I describe in #578, the drake detects dependencies in commands and imported functions by analyzing code without running it. You can see some of that with deps_code().

library(drake)
load_main_example() # New in 6.1.0
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
print(create_plot)
#> function(data) {
#>   ggplot(data, aes(x = Petal.Width, fill = Species)) +
#>     geom_histogram(binwidth = 0.25) +
#>     theme_gray(20)
#> }
deps_code(create_plot)
#> $globals
#> [1] "ggplot"         "aes"            "Petal.Width"    "Species"       
#> [5] "geom_histogram" "theme_gray"

Created on 2018-11-10 by the reprex package (v0.2.1)

Part of that process is based on codetools::findGlobals().

library(drake)
load_main_example() # New in 6.1.0
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(codetools)
findGlobals(create_plot)
#> [1] "{"              "+"              "aes"            "geom_histogram"
#> [5] "ggplot"         "Petal.Width"    "Species"        "theme_gray"

Created on 2018-11-10 by the reprex package (v0.2.1)

The process tries to predict what the code will need in order to run. The purpose is to (1) resolve the order of execution of targets, and (2) help determine which targets are out of date. But the prediction can only approximate what will happen when the code actually runs.

As I describe in #578, non-standard evaluation (NSE) variables like Petal.Width do not get special treatment. In drake >= 6.0.0, this strange behavior is toned down a bit because drake only keeps dependencies that are objects in your environment or targets in your plan. (See #519 and #522 for details). In the reprex below, notice what happens when I define Petal.Width as an actual variable in the environment.

library(drake)
packageVersion("drake")
#> [1] '6.1.0.9000'
load_main_example() # New in 6.1.0
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
make(plan)
#> target raw_data
#> target data
#> target fit
#> target hist
#> target report
config <- drake_config(plan)
vis_drake_graph(config)

Petal.Width <- 123
config <- drake_config(plan)
vis_drake_graph(config)

Created on 2018-11-10 by the reprex package (v0.2.1)

Does that make sense?

@namarks
Copy link
Author

namarks commented Nov 11, 2018

Thanks for the detailed explanation @wlandau, makes total sense. Of course, I was running an out of date version of drake..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants