RFE - caching downloads #162

zenfish · 2025-02-19T18:37:23Z

First, thanks for putting this out.

After a test run, I ran it again after seeing some 404's (apparently caused by some books already on too many devices)... so I did some deletion from old devices, ran it again, and it proceeded to download everything again.

It'd be great to not having to download the entire set every time. Even a simpleminded -

check if book already is downloaded and non-zero size (e.g. "A Tale of Two Cities.azw")
if not already existing, start downloading to tmp ("A Tale of Two Cities.tmp" or w/e)
if successfully downloaded mv tmp to final resting spot

Or heck, just toggle the write bit in the filesystem after a successful download if it's not too ham handed. Of course YMMV.

Tnx again!

rsholmes · 2025-02-19T19:29:21Z

There is the --startFromOffset option, but it seems not very useful because as far as I can tell, books don't necessarily have the same index on different runs. Books 1–100 this time aren't always the same as books 1–100 next time.

treetrum · 2025-02-19T20:52:17Z

Thanks for the suggestion — this is next up on my list to get done. Hopefully within the next 24h our so. Will likely be removing the startFromOffset option as well as I've experienced the same @rsholmes where the index is not consistent enough for this option to be useful anymore. (This is one of the oldest parameters of the tool from a few years ago, so I'm not surprised it no longer works as it originally did)

michaelsmanley · 2025-02-20T20:07:17Z

I hacked together a method for doing this. Before fetching the book URL with a GET, I fetch it with a HEAD, which does include the content-length for the file. Then I look at the downloads folder and if I find the file there with the correct size already, I skip the GET. Otherwise, progress as usual.

I'll put up a PR. It is ugly but it worked and I managed to download 5600+ books with this script. Thank you for creating it!

zenfish · 2025-02-20T22:59:08Z

I was wondering if HEAD would save the day, but I was writing from the car when I filed my RFE and didn't look at the source ;). Fine idea.

eviltofu · 2025-02-21T03:34:57Z

There is the --startFromOffset option, but it seems not very useful because as far as I can tell, books don't necessarily have the same index on different runs. Books 1–100 this time aren't always the same as books 1–100 next time.

If the list is sorted based on the title before downloading, would this make the list consistent?

treetrum · 2025-02-21T08:10:43Z

I have an inflight PR for not downloading files if they already exist available here if anyone would like to test: #183

I'm currently just using the same GET request and aborting it when we already have the file on disk to save one extra request, but I'll play around with using a HEAD request to see if there's an appreciable difference.

Thanks for the input so far!

rsholmes · 2025-02-21T13:16:30Z

Seems to work for me, except that it finds only 200 books. The main branch version finds 1234 books.

I tried changing

    if (
      !data.GetContentOwnershipData.hasMoreItems ||
      allItems.length >= options.totalDownloads
    ) {

to

    if (
      data.GetContentOwnershipData.items.length < batchSize ||
      allItems.length >= options.totalDownloads
    ) {

in index.ts and it went back to finding 1234 books. All 1234 are now downloaded. Thanks!

geetlord · 2025-02-21T15:55:46Z

I had the same issue with hitting a 200 book limit. The above edit also seems to have fixed the issue for me

michaelsmanley · 2025-02-21T17:50:27Z

The hack I used to call HEAD before attempting GET is in #187

It's ugly but it worked for me to snag my 5600+ book library successfully.

treetrum · 2025-02-21T22:12:55Z

@rsholmes thanks for the tip! That does indeed seem to fix the problem. I've updated that PR with the fix (plus a few other improvements).

I'm looking for anyone willing to test that PR before I merge it though, so if anyone here is willing to try it out again and report back, it would be greatly appreciated!

treetrum added the enhancement New feature or request label Feb 19, 2025

treetrum linked a pull request Feb 21, 2025 that will close this issue

Duplicate downloading checking + fix startFromOffset #183

Merged

treetrum closed this as completed in #183 Feb 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFE - caching downloads #162

RFE - caching downloads #162

zenfish commented Feb 19, 2025

rsholmes commented Feb 19, 2025

treetrum commented Feb 19, 2025

michaelsmanley commented Feb 20, 2025

zenfish commented Feb 20, 2025

eviltofu commented Feb 21, 2025

treetrum commented Feb 21, 2025

rsholmes commented Feb 21, 2025 •

edited

Loading

geetlord commented Feb 21, 2025

michaelsmanley commented Feb 21, 2025

treetrum commented Feb 21, 2025

RFE - caching downloads #162

RFE - caching downloads #162

Comments

zenfish commented Feb 19, 2025

rsholmes commented Feb 19, 2025

treetrum commented Feb 19, 2025

michaelsmanley commented Feb 20, 2025

zenfish commented Feb 20, 2025

eviltofu commented Feb 21, 2025

treetrum commented Feb 21, 2025

rsholmes commented Feb 21, 2025 • edited Loading

geetlord commented Feb 21, 2025

michaelsmanley commented Feb 21, 2025

treetrum commented Feb 21, 2025

rsholmes commented Feb 21, 2025 •

edited

Loading