MEMORY: Garbage collect future process after value has been collected #69

HenrikBengtsson · 2016-04-26T18:43:42Z

Issue

When using multiprocess futures that relies on PSOCK cluster nodes, background R sessions or forked processes ("multicore") there might be large objects left behind in those processes after we've collect the value. The processes will keep being alive in the background. Thus, if we run say 20 processes and 19 of them finish early and one keeps processing a long time there after, we occupy unnecessary memory (RAM) due to those 19 processes.

Suggestion

After retrieving the value of a future:

clean up the working environment
launch the garbage collector explicitly
of the R environment in process where the future was resolved.

We already do Step 1 before launching new futures for some of the multiprocess future types. We don't garbage collect explicitly anywhere. Also, Step 1 should not be done for "persistent" futures (persistent=TRUE).

It is not clear to me if it is possible to run code after the value has been retrieved for all types of futures. This might be an issue.

The text was updated successfully, but these errors were encountered:

HenrikBengtsson · 2016-05-22T18:06:27Z

The simplest solution might be to wrap up the future expression expr is something like:

expr_gc <- {
  value <- local(expr)
  gc()
  value
}

This can be used when local=TRUE and persistent=FALSE, which is the most common use case.

Should we add a gc=TRUE argument to all futures to be able to control this? For instance, futures evaluated in the current R process may have gc=FALSE by default, whereas those evaluated in external processes has gc=TRUE by default.

…RUE) works [#69]

HenrikBengtsson · 2016-05-22T18:59:04Z

Added gc=FALSE for all futures.

For now, we'll leave it up to the user to specify it, which is somewhat sub-optimal, but good enough for now until proven useful/needed. Also, it might be that some futures should be garbage collected whereas other might not. The develop can control this using:

x <- future({ expr }, gc=TRUE)
x %<-% { expr } %tweak% list(gc=TRUE)

HenrikBengtsson · 2016-05-25T01:27:02Z

If/when we collect memory and timing stats per future (Issue #59), we could use these to decide whether running the garbage collector is necessary (e.g. enough memory was allocated) and / or would only take a fractional amount of time relative to the total evaluation time of the future (i.e. for long running futures, the time that the garbage collector consumes will be relatively small).

For instance, we can have options controlling when the garbage collector should be run, e.g.

options(future.gc="auto") # Alternatives, FALSE and TRUE
options(future.gc.threshold.time=30) # >= 30 seconds
options(future.gc.threshold.memory=100) # >= 100 MiB more RAM allocated since start.

HenrikBengtsson added the enhancement label Apr 26, 2016

HenrikBengtsson added a commit that referenced this issue May 22, 2016

MEMORY: Added argument 'gc=FALSE' to all futures [#69]

b2fea6e

HenrikBengtsson added a commit that referenced this issue May 22, 2016

Add 'gc' argument to all future contructors so that %tweak% list(gc=T…

cd195a0

…RUE) works [#69]

HenrikBengtsson closed this as completed May 22, 2016

HenrikBengtsson added a commit that referenced this issue May 22, 2016

Explain in NEWS how to control garbage collector [#69] [ci skip]

992d6c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MEMORY: Garbage collect future process after value has been collected #69

MEMORY: Garbage collect future process after value has been collected #69

HenrikBengtsson commented Apr 26, 2016

HenrikBengtsson commented May 22, 2016

HenrikBengtsson commented May 22, 2016

HenrikBengtsson commented May 25, 2016

MEMORY: Garbage collect future process after value has been collected #69

MEMORY: Garbage collect future process after value has been collected #69

Comments

HenrikBengtsson commented Apr 26, 2016

Issue

Suggestion

HenrikBengtsson commented May 22, 2016

HenrikBengtsson commented May 22, 2016

HenrikBengtsson commented May 25, 2016