-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider removing or optionalizing the gc() in this while loop #454
Comments
And since persistent workers check if targets can be skipped, this may be yet another issue improved with #440. |
Then 2002653 should be an improvement. |
This is surely still going to be too much gc'ing for a lot of projects. gc'ing often takes 2-3 seconds and for projects with small targets this will add a lot of overhead. Perhaps your original solution could be modified to detect when a worker is idle for 30 seconds and only gc under those conditions and only once per idling event. Now that I've thought about it this seems like the obvious answer. |
Big improvements with 2002653 using future_lapply. Still took more than 25 mins to get through the up-to-date targets but I'm now I've been staring at the logs waiting for "target ..." to appear for 5 minutes. |
It looks like the vast majority of the overhead time causing the problem in #449 is spent in this while loop:
https://github.com/ropensci/drake/blob/master/R/mclapply.R#L75-L78
If I remove the
gc()
I see substantial speed improvements, though the master process certainly still can't keep up with the workers for the up-to-date targets which obviously process fast. A better solution would be to do a round of preprocessing targets using the same strategy asoutdated
so that the first round of targets sent infl_master
are out-of-date.The text was updated successfully, but these errors were encountered: