Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid hanging processes in the lapply-like backends #373

Merged
merged 3 commits into from
May 3, 2018
Merged

Conversation

wlandau
Copy link
Member

@wlandau wlandau commented May 3, 2018

Summary

Previously, there were problems with hanging processes in the mclapply and parLapply backends when one of the processes exited in error. In this PR, I use on.exit() to enforce the following.

  1. If the master dies, all the workers die.
  2. If all the workers die, the master dies.

Also, the error log of the process is cached in the "mc_fail" storr namespace. For example, retrieve with config$cache$get("2", namespace = "mc_fail") for worker 2 and config$cache$get("0", namespace = "mc_fail") for master. I haven't wrote a friendly API function to retrieve these types of errors, and I currently do not have plans for one. This error caching is mostly just to help me and the contributors debug. Will revisit if these errors come up a lot, but I am expecting most errors to be related to targets once #369 is solved, and there is already error handling for that.

Currently affects mclapply and parLapply parallelism, and will affect future_lapply parallelism going forward (hopefully soon).

GitHub issues addressed

Checklist

  • I have read drake's code of conduct, and I agree to follow its rules.
  • I have read the guidelines for contributing.
  • I have listed any substantial changes in the development news.
  • I have added testthat unit tests to tests/testthat to confirm that any new features or functionality work correctly.
  • I have tested this pull request locally with devtools::check()
  • This pull request is ready for review.
  • I think this pull request is ready to merge.

@wlandau wlandau self-assigned this May 3, 2018
@codecov-io
Copy link

codecov-io commented May 3, 2018

Codecov Report

Merging #373 into master will increase coverage by 0.09%.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #373      +/-   ##
========================================
+ Coverage    99.9%   100%   +0.09%     
========================================
  Files          66     66              
  Lines        5467   5485      +18     
========================================
+ Hits         5462   5485      +23     
+ Misses          5      0       -5
Impacted Files Coverage Δ
R/handlers.R 100% <ø> (ø) ⬆️
R/mclapply.R 100% <100%> (ø) ⬆️
R/migrate.R 100% <0%> (+2.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ba0642...2efaf27. Read the comment docs.

@wlandau wlandau merged commit 2cbbade into master May 3, 2018
@wlandau wlandau deleted the avoid_hang branch May 3, 2018 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants