Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically scale clustermq workers #751

Closed
bart1 opened this issue Feb 22, 2019 · 5 comments
Closed

Dynamically scale clustermq workers #751

bart1 opened this issue Feb 22, 2019 · 5 comments

Comments

@bart1
Copy link

bart1 commented Feb 22, 2019

I encounter an issue where I'm using more resources then needed. I have a set of simulations that need to be summarized before starting the next set. This loop iterates a few times. Now I switched to clustermq to future proof my simulation. The problem is that all the workers are persistent and thus wait until the last simulation is finished. Given the variation in simulation length this wastes resources. Is there a way to make the workers non persistent? I also will encounter issues when the total simulation takes longer then the maximally allowed duration. Non persistent workers would also address this.

@wlandau
Copy link
Member

wlandau commented Feb 22, 2019

Fortunately, make(parallelism = "future") already uses transient workers (non-persistent, one per target). The details are here. HPC timeouts should be less of a problem, and you can define the resources for each target as described here. See the future and future.batchtools packages to learn more about configuring "future" parallelism with your cluster.

@wlandau wlandau closed this as completed Feb 22, 2019
@bart1
Copy link
Author

bart1 commented Feb 22, 2019

Thanks! maybe I misinterpreted #561, I was under the impression future would also disappear

@wlandau
Copy link
Member

wlandau commented Feb 22, 2019

You interpreted the beginning of #561 correctly. Farther along in that thread, though, I realized we needed future to make up for the issues you raised. make(parallelism = "future") is here to stay.

@brendanf
Copy link
Contributor

Would it be possible to kill the extra workers once make has already assigned all remaining tasks?

@wlandau
Copy link
Member

wlandau commented Feb 25, 2019

Both parallel backends actually do this already, even clustermq.

library(drake)
plan <- drake_plan(A = 1, B = Sys.sleep(12))
options(clustermq.scheduler = "your_scheduler", clustermq.template = "your_template.tmpl")
make(plan, parallelism = "clustermq", jobs = 2)

When I run the example above in SGE, I see that both workers start at first, but one of the workers quickly terminates after target A finishes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants