Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the recommendation against using plan inside packages also prohibit default values? #274

Open
HughParsonage opened this issue Jan 27, 2019 · 1 comment
Labels

Comments

@HughParsonage
Copy link
Contributor

Thank you for this wonderful package. I have a question regarding the documentation in ?plan:

For package developers

Please refrain from modifying the future strategy inside your packages / functions, i.e. do not call plan() in your code. Instead, leave the control on what backend to use to the end user. This idea is part of the core philosophy of the future framework - as a developer you can never know what future backends the user have access to. Moreover, by not making any assumptions about what backends are available, your code will also work automatically will any new backends developed after you wrote your code.

My summary of this philosophy is that a package should never wrest control of plan from the end-user.

What is your advice or philosophy about using arguments to control plan and having default values? Essentially, is it okay to use plan("multiprocess"), provided "multisession" can be changed and doesn't conflict with an already established plan? For example, would it be acceptable within the future framework to have a default setting within a package function which uses the following logic?

foo <- function(..., planner = NULL) {
  if (is.null(planner)) {
    if (plan_is_already_invoked()) {
      NULL  # keep using whatever was running
    } else {
      plan(multiprocess)
    }
  } else {
    plan(planner)
  }
  <rest of function>
}

That is, if no planner is supplied to foo, use whatever is running; or, if none are, use "multiprocess".

@HenrikBengtsson
Copy link
Collaborator

Hi. This is a tricky question/topic. My main reason for that recommendation against using plan() inside code/packages is that it risks wreaking havoc. When done inside packages, and as more and more packages are using the future framework, it will not be obvious to users (also not to developers) when and if there will be a change of future plan.

Imagine a developer of PkgA that parallelize via futures and where PkgA depends of PkgB which depends on PkgC. Things have worked just fine for years, but all of a sudden PkgC decides to parallelize via futures as well. So far so good, but the moment PkgC starts to set plan() internally, things will not work as before for PkgA unless it cleans up after itself. At a minimum, if PkgC decides to/needs to set plan() internally, it really really should make sure it undo those changes when done. See #263 (comment) for an example how to do this safely.

Note: the future framework does protect against nested calls blows up via nested, recursive parallel code, i.e. if the caller is already running in parallel, then called code will default to plan(sequential).

That is, if no planner is supplied to foo, use whatever is running; or, if none are, use "multiprocess".

A few comments here. First of all, there is no such thing as "none". The default is plan(sequential), so a future plan is always set. (BTW, this default behavior fits well into the built-in protection against recursive parallel processing.)

What it sounds like you're arguing for is that you want the default behaviour to be running things in parallel rather than sequentially. This has be raised before and personally I'd argue that such a strategy works alright if you're the single user on the machine and you're only running a single software at the time. But, if you attempt to do any kind of multitasking at the same time, or run multiple analyzes at the same time, or there are other users on the same machine, the all-happy-so-fast parallel processing will quickly turn into a sluggish-gray-and-rainy experience.

Because of this, I think it is better to have the end user control how and if parallel processing should be done. The most common is to instruct the user to use plan(), but this can also be controlled via:

  • R option future.plan, e.g. set it .Rprofile
  • Environment variable R_FUTURE_PLAN, e.g. set it .Renviron
  • Command-line option --parallel=ncores to R/Rscript
  • Startup script ./.future.R sourced when the future package is loaded

See ?future::future.options for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants