Does the recommendation against using `plan` inside packages also prohibit default values? #274

HughParsonage · 2019-01-27T23:43:08Z

Thank you for this wonderful package. I have a question regarding the documentation in ?plan:

For package developers

Please refrain from modifying the future strategy inside your packages / functions, i.e. do not call plan() in your code. Instead, leave the control on what backend to use to the end user. This idea is part of the core philosophy of the future framework - as a developer you can never know what future backends the user have access to. Moreover, by not making any assumptions about what backends are available, your code will also work automatically will any new backends developed after you wrote your code.

My summary of this philosophy is that a package should never wrest control of plan from the end-user.

What is your advice or philosophy about using arguments to control plan and having default values? Essentially, is it okay to use plan("multiprocess"), provided "multisession" can be changed and doesn't conflict with an already established plan? For example, would it be acceptable within the future framework to have a default setting within a package function which uses the following logic?

foo <- function(..., planner = NULL) {
  if (is.null(planner)) {
    if (plan_is_already_invoked()) {
      NULL  # keep using whatever was running
    } else {
      plan(multiprocess)
    }
  } else {
    plan(planner)
  }
  <rest of function>
}

That is, if no planner is supplied to foo, use whatever is running; or, if none are, use "multiprocess".

The text was updated successfully, but these errors were encountered:

HenrikBengtsson · 2019-02-08T19:32:24Z

Hi. This is a tricky question/topic. My main reason for that recommendation against using plan() inside code/packages is that it risks wreaking havoc. When done inside packages, and as more and more packages are using the future framework, it will not be obvious to users (also not to developers) when and if there will be a change of future plan.

Imagine a developer of PkgA that parallelize via futures and where PkgA depends of PkgB which depends on PkgC. Things have worked just fine for years, but all of a sudden PkgC decides to parallelize via futures as well. So far so good, but the moment PkgC starts to set plan() internally, things will not work as before for PkgA unless it cleans up after itself. At a minimum, if PkgC decides to/needs to set plan() internally, it really really should make sure it undo those changes when done. See #263 (comment) for an example how to do this safely.

Note: the future framework does protect against nested calls blows up via nested, recursive parallel code, i.e. if the caller is already running in parallel, then called code will default to plan(sequential).

That is, if no planner is supplied to foo, use whatever is running; or, if none are, use "multiprocess".

A few comments here. First of all, there is no such thing as "none". The default is plan(sequential), so a future plan is always set. (BTW, this default behavior fits well into the built-in protection against recursive parallel processing.)

What it sounds like you're arguing for is that you want the default behaviour to be running things in parallel rather than sequentially. This has be raised before and personally I'd argue that such a strategy works alright if you're the single user on the machine and you're only running a single software at the time. But, if you attempt to do any kind of multitasking at the same time, or run multiple analyzes at the same time, or there are other users on the same machine, the all-happy-so-fast parallel processing will quickly turn into a sluggish-gray-and-rainy experience.

Because of this, I think it is better to have the end user control how and if parallel processing should be done. The most common is to instruct the user to use plan(), but this can also be controlled via:

R option future.plan, e.g. set it .Rprofile
Environment variable R_FUTURE_PLAN, e.g. set it .Renviron
Command-line option --parallel=ncores to R/Rscript
Startup script ./.future.R sourced when the future package is loaded

See ?future::future.options for more details.

HenrikBengtsson added the question label Feb 6, 2019

jkurle mentioned this issue Sep 23, 2020

how to use plan() in package futureverse/doFuture#49

Closed

lorenzwalthert mentioned this issue Nov 28, 2020

Future for package developers and temporary overriding of plans #450

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does the recommendation against using `plan` inside packages also prohibit default values? #274

Does the recommendation against using `plan` inside packages also prohibit default values? #274

HughParsonage commented Jan 27, 2019

For package developers

HenrikBengtsson commented Feb 8, 2019

Does the recommendation against using plan inside packages also prohibit default values? #274

Does the recommendation against using plan inside packages also prohibit default values? #274

Comments

HughParsonage commented Jan 27, 2019

For package developers

HenrikBengtsson commented Feb 8, 2019

Does the recommendation against using `plan` inside packages also prohibit default values? #274

Does the recommendation against using `plan` inside packages also prohibit default values? #274