You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Search for duplicates among the existing issues, both open and closed.
Proposal
I found a case where a dynamic target got really close to finishing but did not while I still had workers up and waiting for work. What I suspect happened was that targets were allocated to workers that then disappeared due to the HPC time limit. What I would have liked to have happened was that drake would recognise that the worker has disappeared then send the target to another worker that is still around.
I believe this would require clustermq to be able to say which workers have disappeared via SLURM in my case.
The text was updated successfully, but these errors were encountered:
Unfortunately, drake has no way of knowing which clustermq workers stopped unexpectedly or which target was running at the time. Maybe follow up on mschubert/clustermq#101.
Prework
drake
's code of conduct.Proposal
I found a case where a dynamic target got really close to finishing but did not while I still had workers up and waiting for work. What I suspect happened was that targets were allocated to workers that then disappeared due to the HPC time limit. What I would have liked to have happened was that drake would recognise that the worker has disappeared then send the target to another worker that is still around.
I believe this would require clustermq to be able to say which workers have disappeared via SLURM in my case.
The text was updated successfully, but these errors were encountered: