Dynamically scale clustermq workers #751

bart1 · 2019-02-22T12:35:57Z

I encounter an issue where I'm using more resources then needed. I have a set of simulations that need to be summarized before starting the next set. This loop iterates a few times. Now I switched to clustermq to future proof my simulation. The problem is that all the workers are persistent and thus wait until the last simulation is finished. Given the variation in simulation length this wastes resources. Is there a way to make the workers non persistent? I also will encounter issues when the total simulation takes longer then the maximally allowed duration. Non persistent workers would also address this.

wlandau · 2019-02-22T13:31:10Z

Fortunately, make(parallelism = "future") already uses transient workers (non-persistent, one per target). The details are here. HPC timeouts should be less of a problem, and you can define the resources for each target as described here. See the future and future.batchtools packages to learn more about configuring "future" parallelism with your cluster.

bart1 · 2019-02-22T13:42:13Z

Thanks! maybe I misinterpreted #561, I was under the impression future would also disappear

wlandau · 2019-02-22T13:46:45Z

You interpreted the beginning of #561 correctly. Farther along in that thread, though, I realized we needed future to make up for the issues you raised. make(parallelism = "future") is here to stay.

brendanf · 2019-02-25T09:06:20Z

Would it be possible to kill the extra workers once make has already assigned all remaining tasks?

wlandau · 2019-02-25T13:16:40Z

Both parallel backends actually do this already, even clustermq.

library(drake)
plan <- drake_plan(A = 1, B = Sys.sleep(12))
options(clustermq.scheduler = "your_scheduler", clustermq.template = "your_template.tmpl")
make(plan, parallelism = "clustermq", jobs = 2)

When I run the example above in SGE, I see that both workers start at first, but one of the workers quickly terminates after target A finishes.

wlandau added topic: performance type: faq labels Feb 22, 2019

wlandau closed this as completed Feb 22, 2019

psadil mentioned this issue Nov 24, 2019

when combining, workers persist for longer than needed #1080

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamically scale clustermq workers #751

Dynamically scale clustermq workers #751

bart1 commented Feb 22, 2019

wlandau commented Feb 22, 2019

bart1 commented Feb 22, 2019

wlandau commented Feb 22, 2019

brendanf commented Feb 25, 2019

wlandau commented Feb 25, 2019

Dynamically scale clustermq workers #751

Dynamically scale clustermq workers #751

Comments

bart1 commented Feb 22, 2019

wlandau commented Feb 22, 2019

bart1 commented Feb 22, 2019

wlandau commented Feb 22, 2019

brendanf commented Feb 25, 2019

wlandau commented Feb 25, 2019