Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic branching slowness #1089

Closed
2 tasks done
billdenney opened this issue Dec 4, 2019 · 12 comments
Closed
2 tasks done

Dynamic branching slowness #1089

billdenney opened this issue Dec 4, 2019 · 12 comments

Comments

@billdenney
Copy link
Contributor

Prework

Proposal

I have a dynamic target with a list of ~11,000 data.frames that takes up ~16 MB. When it starts to build the dynamic target, it takes a bit of time (~10 seconds) for to move from the dynamic d_long_tobit line to the first subtarget d_long_tobit_b2a80233. I assume that the time is taken to calculate a hash (b2a80233) of the target.

Would it be possible to cache the hash calculation (as though it were another target) for the dynamic targets? (Or, am I wrong about where the time is going? I did a quick look into the code, but I got lost before I found my way to the hash calculation.)

@wlandau
Copy link
Member

wlandau commented Dec 4, 2019

Let's try to pinpoint the bottleneck. Would you post some profiling data as a zip archive to this thread? Then, we can look at some flame graphs like in #1086 (comment). Different systems have different bottlenecks, so it is helpful to have profiling data directly from your system. (By the way, what OS and R version are you using?)

library(drake)

list_of_data_frames <- list(...)

plan <- drake_plan(
  # Let's try to isolate the slow part.
  d_long_tobit = target(NULL, dynamic = map(list_of_data_frames))
)

Rprof(filename = "samples.rprof")
make(plan) # Do you see the same delay?
Rprof(NULL)
zip(zipfile = "samples.zip", files = "samples.rprof") # Upload samples.zip to this thread.

I suspect the bottleneck is here:

subtargets <- subtargets[target_missing(subtargets, config)]

Specifically here:

drake/R/drake_meta_.R

Lines 72 to 79 in 687ee98

target_missing <- function(target, config) {
!target_exists(target, config)
}
target_exists <- function(target, config) {
config$cache$exists(key = target) &
config$cache$exists(key = target, namespace = "meta")
}

When it comes time to register sub-targets, drake needs to check for the existence of every sub-target in the cache. Interactions with the file system tend to be the slowest parts of the workflow. But ultimately, we need profiling data to take action. Bottlenecks are notoriously hard to guess without hard evidence.

@billdenney
Copy link
Contributor Author

I'm using R 3.6.1 on Windows 10, and I'm running within the current RStudio desktop.

Just counting the seconds, it took about 10 seconds before the first target was listed, and it took 16 seconds from the first target being listed to starting the first subtarget, and then it took a long time (16 minutes) to compute all of the subtargets.

In my use case here, it takes a long time to calculate the results (minutes to hours) due to the large number of targets, but each individual target is relatively fast (seconds). I wonder if there could be some form of disk-write optimization where targets are written to disk when some (configurable) memory cache either fills (e.g. passes 128 MB) or some time duration passes (e.g. 10 minutes) or make exits?

Here is the Rprof output:
samples.zip

At a quick look, it does appear that you're right that most of the time is spent on file system work. Here is everything that took at least 10 seconds:

> summaryRprof("samples.rprof")
$by.self
                             self.time self.pct total.time total.pct
"%||%"                          256.32    26.38     260.70     26.83
"file.rename"                   141.22    14.53     142.10     14.63
"file"                           50.10     5.16      52.22      5.37
"unlink"                         36.72     3.78      41.10      4.23
"$"                              34.18     3.52      46.54      4.79
"%||NA%"                         30.54     3.14      32.36      3.33
"close.connection"               29.34     3.02      29.34      3.02
"subtarget_deps"                 28.28     2.91      30.66      3.16
"unclass"                        27.70     2.85      27.70      2.85
"load_by_as_subdep"              26.92     2.77      27.76      2.86
"file.exists"                    25.36     2.61      30.58      3.15
"load_subtarget_subdeps"         23.66     2.44     153.36     15.78
"FUN"                            20.08     2.07     101.14     10.41
"tempfile"                       14.78     1.52      16.34      1.68
"assert_output_files"            14.06     1.45      14.22      1.46
"<Anonymous>"                    13.96     1.44     436.30     44.90
"with_call_stack"                12.98     1.34      21.26      2.19
"assign_format"                  12.72     1.31      13.04      1.34
"normalizePath"                  11.88     1.22      11.94      1.23

@wlandau
Copy link
Member

wlandau commented Dec 4, 2019

Most of the file-system-related slowness is probably due to richfitz/storr#116. Trying to solve that with richfitz/storr#117.

I just pushed a couple commits to try to attack %||% and %||NA. We will see how much that helps.

Otherwise, here is the overall landscape of runtime.

Screen Shot 2019-12-04 at 1 07 27 PM

You may some speed improvements if you disable history (5% runtime) progress (7%) logging to the console (7%) and prep for data recovery (recoverable in make(), 7%).

But all that is in the overwhelming 16 minutes of execution. To try to isolate that initial 26-second delay, I think we should try to force those sub-targets to fail instead of build.

library(drake)

list_of_data_frames <- list(...)

plan <- drake_plan(
  d_long_tobit = target(stop("sub-target"), dynamic = map(list_of_data_frames))
)

Rprof(filename = "samples.rprof")
try(make(plan)) # keep_going is FALSE by default, so this should stop early.
Rprof(NULL)
zip(zipfile = "samples.zip", files = "samples.rprof")

What I suspect (but don't know for sure) is that register_subtargets() will emerge in the flame graph. Here is what it looks like in the current flame graph.

Screen Shot 2019-12-04 at 1 14 07 PM

You'll notice Sys.info() is a bottleneck in digest(), which is being addressed at eddelbuettel/digest#137. As for %||%, hopefully the last couple commits sped it up (%|||% should be faster).

Other than that, it is a bit hard for me to tell what the bottlenecks are in subtarget_deps() and register_subtarget_layout(). Hopefully the abridged profiling study above will have more targeted samples.

@billdenney
Copy link
Contributor Author

I recently pushed several efficiency gains for digest due to wanting it faster for large data.frames.

To help with this issue, here is a self-contained reprex:

library(drake)
data_for_half_life <-
  lapply(
    X=1:11000,
    FUN=function(...) data.frame(a=rnorm(10), b=rnorm(10), c=rnorm(10), d=rnorm(10))
  )

plan <- drake_plan(
  # Let's try to isolate the slow part.
  d_long_tobit = target(stop("sub-target"), dynamic = map(data_for_half_life))
)

Rprof(filename = "samples.rprof")
make(plan) # Do you see the same delay?
Rprof(NULL)

And, here is the zip of the Rprof from that.
samples2.zip

@wlandau
Copy link
Member

wlandau commented Dec 4, 2019

I recently pushed several efficiency gains for digest due to wanting it faster for large data.frames.

Nice! You can also use target(format = "fst") if your target is a data frame and it will store a lot faster.

@billdenney
Copy link
Contributor Author

target(format = "fst") if your target is a data frame

Is it possible for drake to automatically-detect that? (You could consider that a sub-feature request here.)

@wlandau
Copy link
Member

wlandau commented Dec 4, 2019

Someday another format may surpass fst, and replacing the default behavior would break existing workflows. So if the user decides, it is more future-proof.

@wlandau
Copy link
Member

wlandau commented Dec 4, 2019

By the way, your example code really helps! I was able to reproduce and mitigate some of the same bottlenecks you are seeing. Care to try ed13af1?

@wlandau wlandau changed the title Feature Request: Cache the hashes for dynamic branching Dynamic branching slowness on Windows Dec 4, 2019
@wlandau wlandau changed the title Dynamic branching slowness on Windows Dynamic branching slowness Dec 4, 2019
@billdenney
Copy link
Contributor Author

You cut half the time out!

Using the example in #1089 (comment), I got this new Rprof result.
samples_new.zip

@wlandau wlandau mentioned this issue Dec 4, 2019
2 tasks
@wlandau
Copy link
Member

wlandau commented Dec 4, 2019

Awesome! #1092 and eddelbuettel/digest@7d60b5b are likely to speed it up even more.

@wlandau
Copy link
Member

wlandau commented Dec 4, 2019

Yup, I think those digest-related speedups will be enough to close this issue. Here is your up-to-date flame graph:

Capture

@billdenney
Copy link
Contributor Author

We're good. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants