Nested Futures Use More Memory Than They Should #709

jestover opened this issue Jan 4, 2024 · 2 comments

jestover opened this issue Jan 4, 2024 · 2 comments


jestover commented Jan 4, 2024

I've been running code with nested loops that keeps running into issues with memory usage and I have been trying to come up with a small example that potentially shows the problem. In the example I am just taking a random square matrix and creating a list of the columns. Obviously you wouldn't use a double loop to do this in R but it is hopefully a simple and clear example that shows when using purrr the double loop doesn't increase memory usage while with furrr and future.apply the memory usage explodes.


# purrr
single_loop <- function(x, n) {
  map(1:n, ~ x[, .x])

# future.apply
single_loop_a <- function(x, n) {
  future_lapply(1:n, FUN = function(i) x[, i])

# furrr
single_loop_f <- function(x, n) {
  future_map(1:n, ~ x[, .x])

# purrr
inner_loop <- function(i, n, x = x) {
  map_dbl(1:n, ~ x[.x, i])

outer_loop <- function(x, n) {
  map(1:n, ~ inner_loop(.x, n, x = x))

# future.apply
inner_loop_a <- function(i, n, x = x) {
  future_sapply(1:n, FUN = function(j) x[j, i])

outer_loop_a <- function(x, n) {
  future_lapply(1:n, FUN = function(i) inner_loop_a(i, n, x))

# furrr
inner_loop_f <- function(i, n, x = x) {
  future_map_dbl(1:n, ~ x[.x, i])

outer_loop_f <- function(x, n) {
  future_map(1:n, ~ inner_loop_f(.x, n, x = x))

n <- 100
x <- matrix(rnorm(n * n), nrow = n)

identical(single_loop(x, n), single_loop_f(x, n))
identical(single_loop(x, n), single_loop_a(x, n))
identical(single_loop(x, n), outer_loop(x, n))
identical(single_loop(x, n), outer_loop_a(x, n))
identical(single_loop(x, n), outer_loop_f(x, n))
# All return TRUE


# With a single loop memory usage is similar
bench::mark(single_loop(x, n))$mem_alloc
# 127KB
bench::mark(single_loop_a(x, n))$mem_alloc
# 243KB
bench::mark(single_loop_f(x, n))$mem_alloc
# 340KB

# With a double loop memory usage remains similar for purrr, but explodes 
# on the other two
bench::mark(outer_loop(x, n))$mem_alloc
# 83.6KB
bench::mark(outer_loop_a(x, n))$mem_alloc
# 11.8MB
bench::mark(outer_loop_f(x, n))$mem_alloc
# 21.1MB

# Try again with a larger matrix
n <- 5000
x <- matrix(rnorm(n * n), nrow = n)

bench::mark(single_loop(x, n))$mem_alloc
bench::mark(single_loop_a(x, n))$mem_alloc
bench::mark(single_loop_f(x, n))$mem_alloc

bench::mark(outer_loop(x, n))$mem_alloc
bench::mark(outer_loop_a(x, n))$mem_alloc
bench::mark(outer_loop_f(x, n))$mem_alloc

As you can see, using the double loop actually decreases memory usage for purrr, although it stays very similar, but causes memory usage to explode for furrr and future.apply. I ran this example on a 2023 MacBook, but the actual code that I am trying to fix has been running on a Linux cluster. I ran this example using furrr and future.apply because yesterday I logged a bug report about nested loops using future.callr and @HenrikBengtsson pointed out that it was only an issue with furrr. Please let me know if there is any additional information I can provide or help I can give in solving this issue and thanks for the wonderful collection of packages!

@jestover jestover added the bug label Jan 4, 2024
jestover commented Jan 5, 2024

A little more information. I don't know much about memory profiling, so apologies if this is not the best way to present the information, but in the hopes it might be helpful...


n <- 100
x <- matrix(rnorm(n * n), nrow = n)

single_loop(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
    allocs = n(),
    total_bytes = sum(bytes)
#   allocs total_bytes
# 1    201      130448

single_loop_a(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
    allocs = n(),
    total_bytes = sum(bytes)
#   allocs total_bytes
# 1    364      251360

single_loop_f(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
    allocs = n(),
    total_bytes = sum(bytes)
#   allocs total_bytes
# 1    475      353128

outer_loop(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
    allocs = n(),
    total_bytes = sum(bytes)
#   allocs total_bytes
# 1    101       85648

outer_loop_a(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
    allocs = n(),
    total_bytes = sum(bytes)
#   allocs total_bytes
# 1  17101    12623144

outer_loop_f(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
    allocs = n(),
    total_bytes = sum(bytes),
#   allocs total_bytes
# 1  27812    22595240

moosterwegel commented Oct 10, 2024

Update: I fixed my problem. Turns out I was not familiar with how future handles environments. See and

I think I ran into the same problem or at least a very similar problem. Apologies for the somewhat convoluted data reconstruction but it's a simulation of the data that I used when I first encountered it. Here's my reprex:


logistic_model <- function(feature, df_other_vars, formula) {
  df <- dplyr::bind_cols(df_other_vars, "x" = feature)
  m <- glm(formula(formula), 
           data = df, 
           family = binomial(logit))

nested_map <- function(imputed_versions_feature, ...) {
  models <- imputed_versions_feature |>
               logistic_model(feature = imputed_version_feature, ...))
  return(models[1]) # originally mice::pool call, but not necessary for demonstration

gen_names <- function(n = 1) {
  mz <- runif(min = 10, max = 200, n = n) |> signif(7)
  rt <- runif(min = 0, max = 12, n = n) |> signif(7)
  string <- glue::glue("X{mz}_{rt}")

gen_x <- function(dummy, nr_imputations = 60, n = 1000) {
  x <- replicate(nr_imputations, rnorm(n)) |> tibble::as_tibble() 

list_of_feature_dfs <- gen_names(1024) |> 
  tibble::as_tibble() |> 
  tidyr::pivot_wider(names_from = value) |> 

df <- tibble::tibble(y = rbinom(1000, 1, 0.5))

seed <- 1309
furrr_options <- furrr::furrr_options(seed = seed)
future::plan(future::multisession, workers = 16)

# no  problems
r <- list_of_feature_dfs |> 
  furrr::future_map(\(feature) nested_map(imputed_versions_feature = feature, 
                                          df_other_vars = df,
                                          formula = 'y ~ x'),
                    .progress = TRUE,
                    .options = furrr_options)

# same as above but via function call: cpu's never really get going, memory keeps ever increasing - doesn't finish
wrapper <- function(list_of_feature_dfs, formula, df_other_vars, furrr_options) {
  results <- list_of_feature_dfs %>% 
    furrr::future_map(\(feature) nested_map(imputed_versions_feature = feature, 
                                            df_other_vars = df_other_vars,
                                            formula = formula),
                      .progress = TRUE,
                      .options = furrr_options)

r_function <- wrapper(list_of_feature_dfs, 'y ~ x', df, furrr_options = furrr_options)

Session info:

> sessionInfo()
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2019 x64 (build 17763)

Matrix products: default

[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

time zone: Etc/UTC
tzcode source: internal

attached base packages:
  [1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
  [1] tibble_3.2.1  tidyr_1.3.1   dplyr_1.1.4   purrr_1.0.2  
[5] furrr_0.3.1   future_1.34.0

loaded via a namespace (and not attached):
  [1] digest_0.6.37     utf8_1.2.4        R6_2.5.1         
[4] codetools_0.2-19  tidyselect_1.2.1  magrittr_2.0.3   
[7] glue_1.8.0        parallel_4.3.2    pkgconfig_2.0.3  
[10] generics_0.1.3    lifecycle_1.0.4   cli_3.6.3        
[13] fansi_1.0.6       parallelly_1.38.0 vctrs_0.6.5      
[16] compiler_4.3.2    globals_0.16.3    rstudioapi_0.16.0
[19] tools_4.3.2       listenv_0.9.1     pillar_1.9.0     
[22] rlang_1.1.4   

I think the nested map is not the main culprit for me. It's when I put the future call into a function call that I really run into this issue where the cpu's never really get going, but the memory keeps ever increasing. In fact, I run out of 32GB of memory before the code is close to finishing. I have been able to consistently reproduce this across three different machines (Windows, Windows Server, Docker container running Ubuntu via WSL). Any ideas? Or anything I should look into? Thanks!

