You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current default is options(future.globals.maxSize = 500 * 1024^2) = 500 MiB.
Background
This threshold was chosen somewhat arbitrarily back in 2015-2016. The important point was to protect users from sending ginormous objects over the internet by mistake. Since futureverse made parallelization so easy, and it was not always obvious to end-users that parallelization took place, there was, and still is, a risk that someone ends up sending 1-100's of GBs across an enterprise network, or over the internet, by mistake without even knowing. At best, this would just slow things down, but it can potentially also introduce large egress and ingress costs, or internet costs (some people still on metered internet, or might even tether over their mobile phone).
library(future)
options(future.globals.maxSize=1e6) # lower the threshold to 1 MBx<- rnorm(1e6)
f<- future(sum(x))
#> Error in getGlobalsAndPackages(expr, envir = envir, tweak = tweakExpression, : #> The total size of the 1 globals exported for future expression ('sum(x)') is 7.63 MiB.#> This exceeds the maximum allowed size 976.56 KiB (option 'future.globals.maxSize').#> There is one global: 'x' (7.63 MiB of class 'numeric')
The downside of having such a "hidden" threshold, is that users are not aware of it, and the day they hit the limit, they end up having to spend time troubleshooting and understand the problem. It might also waste compute resources, especially if this built-in protection kicks in at the very end of a long-running process.
Example
Actions
Make the error message more informative by explaining why it exists and point to the help pages for more details
Emphasize the availability of this option in the documentation
Remove the default threshold, i.e. set it to options(future.globals.maxSize = +Inf)
Make it possible to set the threshold when configuring the future backend, e.g. plan(cluster, workers = remote_hosts, globals.maxSize = 500*1024^2)
The text was updated successfully, but these errors were encountered:
Would it be realistic to condition the default based on locality of the future? For instance, multisession and all local future plans would benefit from +Inf, where possibly-remote plans might be a little more conservative.
The current default is
options(future.globals.maxSize = 500 * 1024^2)
= 500 MiB.Background
This threshold was chosen somewhat arbitrarily back in 2015-2016. The important point was to protect users from sending ginormous objects over the internet by mistake. Since futureverse made parallelization so easy, and it was not always obvious to end-users that parallelization took place, there was, and still is, a risk that someone ends up sending 1-100's of GBs across an enterprise network, or over the internet, by mistake without even knowing. At best, this would just slow things down, but it can potentially also introduce large egress and ingress costs, or internet costs (some people still on metered internet, or might even tether over their mobile phone).
The downside of having such a "hidden" threshold, is that users are not aware of it, and the day they hit the limit, they end up having to spend time troubleshooting and understand the problem. It might also waste compute resources, especially if this built-in protection kicks in at the very end of a long-running process.
Example
Actions
options(future.globals.maxSize = +Inf)
plan(cluster, workers = remote_hosts, globals.maxSize = 500*1024^2)
The text was updated successfully, but these errors were encountered: