Documentation request: behavior of read_*_chunked when number of rows exceeds maximum integer value #1177

timothy-barry · 2021-01-29T04:25:45Z

Hello,

Thank you for the response to my previous issue.

Can I safely use read_*_chunked when the number of rows in the file exceeds R's maximum integer value of about 2 billion? I will be reading fewer than 2 billion rows per chunk. Moreover, I will not use the index or "pos" argument in the callback functions.

jimhester · 2021-01-29T15:10:50Z

Reading the file should 'work', though pos will wrap around.

It is going to be very slow however.

I generated a file with 4 billion lines of just '1'` and the read it with

read_lines_raw_chunked("out", function(x, pos) print(pos), chunk_size = 10000000, progress=F))

It took over an hour to get to 2 Billion rows read and start wrapping. As this is about as simple as you can get reading more complex data and doing anything useful with it seems like it would take too long to be practical.

I think you would probably be better off looking into other tools more suited to handle data of this size.

timothy-barry · 2021-01-29T16:56:43Z

Thanks Jim. I've noticed that read_*_chunked is slow when printing pos within the callback function. I wonder if another callback function, such as function(x, pos) return(10), would be faster?

EDIT: I'll give this a try and get back.

Previously the read would overflow in these cases. Fixes #1177

timothy-barry · 2021-02-10T01:56:56Z

Thank you. I'll give this a shot!

jimhester added a commit that referenced this issue Feb 2, 2021

Support files with greater than INT_MAX lines

c3169d0

Previously the read would overflow in these cases. Fixes #1177

timothy-barry closed this as completed Feb 10, 2021

pegeler mentioned this issue Sep 2, 2024

read_csv_chunked fails when file contains greater than INT_MAX rows #1554

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation request: behavior of read_*_chunked when number of rows exceeds maximum integer value #1177

Documentation request: behavior of read_*_chunked when number of rows exceeds maximum integer value #1177

timothy-barry commented Jan 29, 2021

jimhester commented Jan 29, 2021

timothy-barry commented Jan 29, 2021 •

edited

Loading

timothy-barry commented Feb 10, 2021

Documentation request: behavior of read_*_chunked when number of rows exceeds maximum integer value #1177

Documentation request: behavior of read_*_chunked when number of rows exceeds maximum integer value #1177

Comments

timothy-barry commented Jan 29, 2021

jimhester commented Jan 29, 2021

timothy-barry commented Jan 29, 2021 • edited Loading

timothy-barry commented Feb 10, 2021

timothy-barry commented Jan 29, 2021 •

edited

Loading