Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make options(future.globals.maxSize = +Inf) the default #762

Open
2 of 4 tasks
HenrikBengtsson opened this issue Feb 11, 2025 · 1 comment
Open
2 of 4 tasks

Make options(future.globals.maxSize = +Inf) the default #762

HenrikBengtsson opened this issue Feb 11, 2025 · 1 comment
Milestone

Comments

@HenrikBengtsson
Copy link
Collaborator

HenrikBengtsson commented Feb 11, 2025

The current default is options(future.globals.maxSize = 500 * 1024^2) = 500 MiB.

Background

This threshold was chosen somewhat arbitrarily back in 2015-2016. The important point was to protect users from sending ginormous objects over the internet by mistake. Since futureverse made parallelization so easy, and it was not always obvious to end-users that parallelization took place, there was, and still is, a risk that someone ends up sending 1-100's of GBs across an enterprise network, or over the internet, by mistake without even knowing. At best, this would just slow things down, but it can potentially also introduce large egress and ingress costs, or internet costs (some people still on metered internet, or might even tether over their mobile phone).

library(future)
options(future.globals.maxSize = 1e6) # lower the threshold to 1 MB
x <- rnorm(1e6)
f <- future(sum(x))
#> Error in getGlobalsAndPackages(expr, envir = envir, tweak = tweakExpression,  : 
#>   The total size of the 1 globals exported for future expression ('sum(x)') is 7.63 MiB.
#> This exceeds the maximum allowed size 976.56 KiB (option 'future.globals.maxSize').
#> There is one global: 'x' (7.63 MiB of class 'numeric')

The downside of having such a "hidden" threshold, is that users are not aware of it, and the day they hit the limit, they end up having to spend time troubleshooting and understand the problem. It might also waste compute resources, especially if this built-in protection kicks in at the very end of a long-running process.

Example

Actions

  • Make the error message more informative by explaining why it exists and point to the help pages for more details
  • Emphasize the availability of this option in the documentation
  • Remove the default threshold, i.e. set it to options(future.globals.maxSize = +Inf)
  • Make it possible to set the threshold when configuring the future backend, e.g. plan(cluster, workers = remote_hosts, globals.maxSize = 500*1024^2)
@r2evans
Copy link

r2evans commented Feb 22, 2025

Would it be realistic to condition the default based on locality of the future? For instance, multisession and all local future plans would benefit from +Inf, where possibly-remote plans might be a little more conservative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants