Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with whitespace parsing in read_table #1118

Closed
phil-grayson opened this issue Aug 2, 2020 · 4 comments
Closed

issue with whitespace parsing in read_table #1118

phil-grayson opened this issue Aug 2, 2020 · 4 comments

Comments

@phil-grayson
Copy link

Hi,

I've just completed a Genome Wide Association run of PLINK! | v1.07 | 10/Aug/2009

One of the output files there is space delimited and after some googling, I found that read_table is the tidyverse solution. I'm running 1.3.1 of readr in R 4.0.0. I've tried the following (the head -100 of the file can be found attached here for a test).

assoc_readr <- read_table("readr_test.txt", col_names = TRUE)
this one parses the first 3 columns correctly and then lumps the last 7

assoc_base <- read.table("readr_test.txt", sep = "" , header = T)
this one does it right. 10 columns parsed on whitespace. the whole file is just shy of a gig, so I had wanted to use tidyverse to load it in (love that progress bar!)

(https://github.com/tidyverse/readr/files/5012278/readr_test.txt)

I was able to get it all sorted out with base R, but wanted to make an issue so that it can be fixed if it is a larger problem. cheers!

@jimhester
Copy link
Collaborator

jimhester commented Aug 3, 2020

readr::read_table() uses the same width for all columns.

The function analogous to utils::read.table() is readr::read_table2(), which breaks on whitespace.

This is mentioned in ?read_table

‘read_table2()’ is like ‘read.table()’, it allows any number of
whitespace characters between columns, and the lines can be of
different lengths.

read_table()’ is more strict, each line must be the same length,
and each field is in the same position in every line. It first
finds empty columns and then parses like a fixed width file.

@postylem
Copy link

postylem commented Feb 5, 2022

Just a side note for clarification:

readr::read_table() uses the same width for all columns.

I don't think that's true, if I'm interpreting you right. With read_table(), the columns don't have to all be the same width as each other, but any one column is the same width for all rows. In contrast, read_table2() will break on any whitespace, so columns don't have to be aligned.

For example, if myfile.txt contains:

NAME                         X  Y    Z
very-long-entry              1  2    3
short                        1  1  2.3
fi fi fo fum                 3 23  101
something                    3 -1  s 0 

then read_table('myfile.txt') gives

# A tibble: 4 x 4
  NAME                X     Y Z    
  <chr>           <dbl> <dbl> <chr>
1 very-long-entry     1     2 3    
2 short               1     1 2.3  
3 fi fi fo fum        3    23 101  
4 something           3    -1 s 0  

while read_table2('myfile.txt') gives the following (with warnings about parsing failures)

# A tibble: 4 x 4
  NAME            X     Y     Z    
  <chr>           <chr> <chr> <chr>
1 very-long-entry 1     2     3    
2 short           1     1     2.3  
3 fi              fi    fo    fum  
4 something       3     -1    s    

@jennybc
Copy link
Member

jennybc commented Feb 5, 2022

Things have changed since this issue was current.

https://readr.tidyverse.org/news/index.html#deprecated-or-superseded-functions-and-features-2-0-0

In readr 2.0.0, released 2021-07-20 (after this issue thread concluded):

read_table2() has been renamed to read_table(), as most users expect read_table() to work like utils::read.table(). If you want the previous strict behavior of the read_table() you can use read_fwf() with fwf_empty() directly (#717).

@postylem
Copy link

postylem commented Feb 5, 2022

Thanks. I got here from a search, and was confused for a bit, so thought I'd make note for others who might land here too. Thanks for clarifying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants