Dynamic branching slowness #1089

billdenney · 2019-12-04T15:00:11Z

Prework

Read and abide by drake's code of conduct.
Search for duplicates among the existing issues, both open and closed.

Proposal

I have a dynamic target with a list of ~11,000 data.frames that takes up ~16 MB. When it starts to build the dynamic target, it takes a bit of time (~10 seconds) for to move from the dynamic d_long_tobit line to the first subtarget d_long_tobit_b2a80233. I assume that the time is taken to calculate a hash (b2a80233) of the target.

Would it be possible to cache the hash calculation (as though it were another target) for the dynamic targets? (Or, am I wrong about where the time is going? I did a quick look into the code, but I got lost before I found my way to the hash calculation.)

The text was updated successfully, but these errors were encountered:

wlandau · 2019-12-04T15:41:27Z

Let's try to pinpoint the bottleneck. Would you post some profiling data as a zip archive to this thread? Then, we can look at some flame graphs like in #1086 (comment). Different systems have different bottlenecks, so it is helpful to have profiling data directly from your system. (By the way, what OS and R version are you using?)

library(drake)

list_of_data_frames <- list(...)

plan <- drake_plan(
  # Let's try to isolate the slow part.
  d_long_tobit = target(NULL, dynamic = map(list_of_data_frames))
)

Rprof(filename = "samples.rprof")
make(plan) # Do you see the same delay?
Rprof(NULL)
zip(zipfile = "samples.zip", files = "samples.rprof") # Upload samples.zip to this thread.

I suspect the bottleneck is here:

drake/R/dynamic.R

Line 275 in 687ee98

subtargets <- subtargets[target_missing(subtargets, config)]

Specifically here:

drake/R/drake_meta_.R

Lines 72 to 79 in 687ee98

    
           target_missing <- function(target, config) { 
        
             !target_exists(target, config) 
        
           } 
        
           target_exists <- function(target, config) { 
        
             config$cache$exists(key = target) & 
        
               config$cache$exists(key = target, namespace = "meta") 
        
           }

When it comes time to register sub-targets, drake needs to check for the existence of every sub-target in the cache. Interactions with the file system tend to be the slowest parts of the workflow. But ultimately, we need profiling data to take action. Bottlenecks are notoriously hard to guess without hard evidence.

billdenney · 2019-12-04T16:17:46Z

I'm using R 3.6.1 on Windows 10, and I'm running within the current RStudio desktop.

Just counting the seconds, it took about 10 seconds before the first target was listed, and it took 16 seconds from the first target being listed to starting the first subtarget, and then it took a long time (16 minutes) to compute all of the subtargets.

In my use case here, it takes a long time to calculate the results (minutes to hours) due to the large number of targets, but each individual target is relatively fast (seconds). I wonder if there could be some form of disk-write optimization where targets are written to disk when some (configurable) memory cache either fills (e.g. passes 128 MB) or some time duration passes (e.g. 10 minutes) or make exits?

Here is the Rprof output:
samples.zip

At a quick look, it does appear that you're right that most of the time is spent on file system work. Here is everything that took at least 10 seconds:

> summaryRprof("samples.rprof")
$by.self
                             self.time self.pct total.time total.pct
"%||%"                          256.32    26.38     260.70     26.83
"file.rename"                   141.22    14.53     142.10     14.63
"file"                           50.10     5.16      52.22      5.37
"unlink"                         36.72     3.78      41.10      4.23
"$"                              34.18     3.52      46.54      4.79
"%||NA%"                         30.54     3.14      32.36      3.33
"close.connection"               29.34     3.02      29.34      3.02
"subtarget_deps"                 28.28     2.91      30.66      3.16
"unclass"                        27.70     2.85      27.70      2.85
"load_by_as_subdep"              26.92     2.77      27.76      2.86
"file.exists"                    25.36     2.61      30.58      3.15
"load_subtarget_subdeps"         23.66     2.44     153.36     15.78
"FUN"                            20.08     2.07     101.14     10.41
"tempfile"                       14.78     1.52      16.34      1.68
"assert_output_files"            14.06     1.45      14.22      1.46
"<Anonymous>"                    13.96     1.44     436.30     44.90
"with_call_stack"                12.98     1.34      21.26      2.19
"assign_format"                  12.72     1.31      13.04      1.34
"normalizePath"                  11.88     1.22      11.94      1.23

wlandau · 2019-12-04T18:18:38Z

Most of the file-system-related slowness is probably due to richfitz/storr#116. Trying to solve that with richfitz/storr#117.

I just pushed a couple commits to try to attack %||% and %||NA. We will see how much that helps.

Otherwise, here is the overall landscape of runtime.

You may some speed improvements if you disable history (5% runtime) progress (7%) logging to the console (7%) and prep for data recovery (recoverable in make(), 7%).

But all that is in the overwhelming 16 minutes of execution. To try to isolate that initial 26-second delay, I think we should try to force those sub-targets to fail instead of build.

library(drake)

list_of_data_frames <- list(...)

plan <- drake_plan(
  d_long_tobit = target(stop("sub-target"), dynamic = map(list_of_data_frames))
)

Rprof(filename = "samples.rprof")
try(make(plan)) # keep_going is FALSE by default, so this should stop early.
Rprof(NULL)
zip(zipfile = "samples.zip", files = "samples.rprof")

What I suspect (but don't know for sure) is that register_subtargets() will emerge in the flame graph. Here is what it looks like in the current flame graph.

You'll notice Sys.info() is a bottleneck in digest(), which is being addressed at eddelbuettel/digest#137. As for %||%, hopefully the last couple commits sped it up (%|||% should be faster).

Other than that, it is a bit hard for me to tell what the bottlenecks are in subtarget_deps() and register_subtarget_layout(). Hopefully the abridged profiling study above will have more targeted samples.

billdenney · 2019-12-04T18:53:29Z

I recently pushed several efficiency gains for digest due to wanting it faster for large data.frames.

To help with this issue, here is a self-contained reprex:

library(drake)
data_for_half_life <-
  lapply(
    X=1:11000,
    FUN=function(...) data.frame(a=rnorm(10), b=rnorm(10), c=rnorm(10), d=rnorm(10))
  )

plan <- drake_plan(
  # Let's try to isolate the slow part.
  d_long_tobit = target(stop("sub-target"), dynamic = map(data_for_half_life))
)

Rprof(filename = "samples.rprof")
make(plan) # Do you see the same delay?
Rprof(NULL)

And, here is the zip of the Rprof from that.
samples2.zip

wlandau · 2019-12-04T19:10:53Z

I recently pushed several efficiency gains for digest due to wanting it faster for large data.frames.

Nice! You can also use target(format = "fst") if your target is a data frame and it will store a lot faster.

billdenney · 2019-12-04T19:23:03Z

target(format = "fst") if your target is a data frame

Is it possible for drake to automatically-detect that? (You could consider that a sub-feature request here.)

wlandau · 2019-12-04T19:39:12Z

Someday another format may surpass fst, and replacing the default behavior would break existing workflows. So if the user decides, it is more future-proof.

wlandau · 2019-12-04T21:23:20Z

By the way, your example code really helps! I was able to reproduce and mitigate some of the same bottlenecks you are seeing. Care to try ed13af1?

billdenney · 2019-12-04T21:47:13Z

You cut half the time out!

Using the example in #1089 (comment), I got this new Rprof result.
samples_new.zip

wlandau · 2019-12-04T22:59:38Z

Awesome! #1092 and eddelbuettel/digest@7d60b5b are likely to speed it up even more.

wlandau · 2019-12-04T23:03:06Z

Yup, I think those digest-related speedups will be enough to close this issue. Here is your up-to-date flame graph:

billdenney · 2019-12-05T00:12:03Z

We're good. Thank you!

billdenney added the type: new feature label Dec 4, 2019

billdenney assigned wlandau Dec 4, 2019

wlandau added the type: bottleneck label Dec 4, 2019

wlandau-lilly added a commit that referenced this issue Dec 4, 2019

Minor speedup in subtarget_deps, re #1089

17e08cc

billdenney mentioned this issue Dec 4, 2019

Feature Request: Speed up saving by automatically choosing the fastest method #1091

Closed

3 tasks

wlandau mentioned this issue Dec 4, 2019

Profiling studies on more platforms #1079

Closed

3 tasks

wlandau changed the title ~~Feature Request: Cache the hashes for dynamic branching~~ Dynamic branching slowness on Windows Dec 4, 2019

wlandau changed the title ~~Dynamic branching slowness on Windows~~ Dynamic branching slowness Dec 4, 2019

wlandau mentioned this issue Dec 4, 2019

Use getVDigest() #1092

Closed

2 tasks

billdenney closed this as completed Dec 5, 2019

This was referenced Dec 5, 2019

drake is slow in a Windows use case #1086

Closed

Question: Thread safety requirements & pitfalls for storr backends with caching = "worker" #1094

Closed

Dynamic branching #685

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic branching slowness #1089

Dynamic branching slowness #1089

billdenney commented Dec 4, 2019

wlandau commented Dec 4, 2019 •

edited

Loading

billdenney commented Dec 4, 2019

wlandau commented Dec 4, 2019

billdenney commented Dec 4, 2019

wlandau commented Dec 4, 2019

billdenney commented Dec 4, 2019

wlandau commented Dec 4, 2019

wlandau commented Dec 4, 2019

billdenney commented Dec 4, 2019

wlandau commented Dec 4, 2019

wlandau commented Dec 4, 2019

billdenney commented Dec 5, 2019

Dynamic branching slowness #1089

Dynamic branching slowness #1089

Comments

billdenney commented Dec 4, 2019

Prework

Proposal

wlandau commented Dec 4, 2019 • edited Loading

billdenney commented Dec 4, 2019

wlandau commented Dec 4, 2019

billdenney commented Dec 4, 2019

wlandau commented Dec 4, 2019

billdenney commented Dec 4, 2019

wlandau commented Dec 4, 2019

wlandau commented Dec 4, 2019

billdenney commented Dec 4, 2019

wlandau commented Dec 4, 2019

wlandau commented Dec 4, 2019

billdenney commented Dec 5, 2019

wlandau commented Dec 4, 2019 •

edited

Loading