Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send a target to a second worker in clustermq parallelism #1287

Closed
2 tasks done
kendonB opened this issue Jun 25, 2020 · 1 comment
Closed
2 tasks done

send a target to a second worker in clustermq parallelism #1287

kendonB opened this issue Jun 25, 2020 · 1 comment
Assignees

Comments

@kendonB
Copy link
Contributor

kendonB commented Jun 25, 2020

Prework

Proposal

I found a case where a dynamic target got really close to finishing but did not while I still had workers up and waiting for work. What I suspect happened was that targets were allocated to workers that then disappeared due to the HPC time limit. What I would have liked to have happened was that drake would recognise that the worker has disappeared then send the target to another worker that is still around.

I believe this would require clustermq to be able to say which workers have disappeared via SLURM in my case.

@wlandau
Copy link
Member

wlandau commented Jun 25, 2020

Unfortunately, drake has no way of knowing which clustermq workers stopped unexpectedly or which target was running at the time. Maybe follow up on mschubert/clustermq#101.

@wlandau wlandau closed this as completed Jun 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants