Overwriting an imported file will segfault (or garble the file) #1273

knokknok · 2021-08-14T12:42:30Z

The following will crash on my computer (or mess up the file)

library(readr)

file <- file.path("/tmp/tmp.csv")
write.csv(cars, file)

data <- read_csv(file)
data$speed <- 2*data$speed
write.csv(data, file=file)

sessionInfo()

R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] readr_2.0.1

loaded via a namespace (and not attached):
 [1] compiler_4.1.1  magrittr_2.0.1  ellipsis_0.3.2  R6_2.5.0
 [5] hms_1.1.0       pillar_1.6.2    tibble_3.1.3    crayon_1.4.1
 [9] utf8_1.2.2      fansi_0.5.0     vctrs_0.3.8     tzdb_0.1.2
[13] lifecycle_1.0.0 pkgconfig_2.0.3 rlang_0.4.11

The text was updated successfully, but these errors were encountered:

martin-rueegg · 2021-08-14T21:24:39Z

(For solution/work-around see further down this comment under the section "Solution".)

Not sure this is related, but when I try to write to an previously imported file, I get the following error:

Error in file(file, ifelse(append, "a", "w")) : 
  cannot open the connection
In addition: Warning message:
In file(file, ifelse(append, "a", "w")) :

This happens when I use readr::read_csv and utils::write.csv

The stack trace is

11.
file(file, ifelse(append, "a", "w")) 
10.
write.table(COMICS1.ALL.df, COMICS1_KO_and_tara.ALL.csv, row.names = FALSE, 
    col.names = TRUE, sep = ",", dec = ".", qmethod = "double") 
9.
eval(expr, p) 
8.
eval(expr, p) 
7.
eval.parent(Call) 
6.
write.csv(COMICS1.ALL.df, COMICS1_KO_and_tara.ALL.csv, row.names = FALSE) at COMICS1_KO_and_tara.ALL.R#310
5.
load.COMICS1.fn(force = 0, update = TRUE) at COMICS1_KO_and_tara.ALL.R#371
4.
eval(ei, envir) 
3.
eval(ei, envir) 
2.
withVisible(eval(ei, envir)) 
1.
source(paste0(comics1.1.KO_Tax.path, "/All kos/COMICS1_KO_and_tara.ALL.R"))

What is interesting, is, that at the very moment, the file is open by and only by the very rsession.exe. So it seems to have to do with lazy loading!

When using readr::read_csv2 , the read seems fail when merging the imported data frame with another one: The merge never happens, instead the rsession.exe crashes and restarts continuously.

Solution

When I use

lazy = FALSE

# as in ...
readr::read_csv(file, lazy = FALSE)

then the error does not happen.

So it would be good to have that as a warning in the documentation, that one needs to disable lazy loading if one intends to update/overwrite the file later.

Info

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    
system code page: 65001

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.0 ggplot2_3.3.5     digest_0.6.27     tibble_3.1.3      tidyr_1.1.3       dplyr_1.0.7      
[7] readr_2.0.1      

loaded via a namespace (and not attached):
 [1] rstudioapi_0.13  magrittr_2.0.1   hms_1.1.0        bit_4.0.4        munsell_0.5.0    tidyselect_1.1.1 colorspace_2.0-2
 [8] R6_2.5.0         rlang_0.4.11     fansi_0.5.0      stringr_1.4.0    tools_4.1.0      parallel_4.1.0   grid_4.1.0      
[15] vroom_1.5.4      gtable_0.3.0     utf8_1.2.2       withr_2.4.2      askpass_1.1      ellipsis_0.3.2   bit64_4.0.5     
[22] openssl_1.4.4    lifecycle_1.0.0  crayon_1.4.1     purrr_0.3.4      tzdb_0.1.2       vctrs_0.3.8      glue_1.4.2      
[29] stringi_1.7.3    compiler_4.1.0   pillar_1.6.2     scales_1.1.1     generics_0.1.0   pkgconfig_2.0.3

martin-rueegg · 2021-08-14T21:42:36Z

On a further note:

Since the default for readr::read_delim seems to be lazy = TRUE there are other undesired side effects:

Since readr::read_delim is actually locking the file, any move/rename or update through an external source (be it that the user manually updates the csv file or it gets synchronized from a server), all those actions fail.

We had repeating issues with this but only now I realize that they relate to this lazy loading!

While I do understand the choice for the default to load lazily, I would not assume that this is whats happening. I would strongly suggest to re-evaluate this decision and appeal to switch to non-lazy loading by default. There could be an option to introduce an alternative verb, e.g. readr::open_delim (with its derivates readr::open_csv. readr::open_tsv etc.) that would have lazy loading as the default. Read, as opposed to open does not suggest that the file keeps open and locked in the background.

Another option would to show a warning message when using the default setting lazy = NULL (the new default, which equals to TRUE, but with the warning message). Setting it explicitly to "TRUE" would disable the warning.

martin-rueegg · 2021-08-16T14:05:51Z

Hi @jimhester

Thanks for updating the documentation. I have not seen that in the documentation before - apparently it was already there. The problem is that you find the error while writing, not while reading the file. And you cannot always read the entire documentation to find such an important side-effect.

My suggestion would be to either

change the default (probably not the best idea),
give out a warning while reading with no lazy parameter set (that can be disabled by passing lazy = TRUE; which would be my favorite), or
put the warning at a prominent page at the top of the documentation page (does still not solve the problem of finding that information while writing).

Would you accept a PR for implementing the warning message?

Shians · 2021-08-18T05:23:11Z

Agree with the above, it's a major side-effect that arguably breaks the drop-in quality of the read_* functions for the base and data.table equivalents. I think there's a case to be made for least surprise vs maximum performance.

jimhester added the documentation label Aug 16, 2021

jimhester closed this as completed in 939ee6b Aug 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overwriting an imported file will segfault (or garble the file) #1273

Overwriting an imported file will segfault (or garble the file) #1273

knokknok commented Aug 14, 2021

martin-rueegg commented Aug 14, 2021

martin-rueegg commented Aug 14, 2021

martin-rueegg commented Aug 16, 2021

Shians commented Aug 18, 2021

Overwriting an imported file will segfault (or garble the file) #1273

Overwriting an imported file will segfault (or garble the file) #1273

Comments

knokknok commented Aug 14, 2021

martin-rueegg commented Aug 14, 2021

Solution

Info

martin-rueegg commented Aug 14, 2021

martin-rueegg commented Aug 16, 2021

Shians commented Aug 18, 2021