Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE - caching downloads #162

Closed
zenfish opened this issue Feb 19, 2025 · 10 comments · Fixed by #183
Closed

RFE - caching downloads #162

zenfish opened this issue Feb 19, 2025 · 10 comments · Fixed by #183
Labels
enhancement New feature or request

Comments

@zenfish
Copy link

zenfish commented Feb 19, 2025

First, thanks for putting this out.

After a test run, I ran it again after seeing some 404's (apparently caused by some books already on too many devices)... so I did some deletion from old devices, ran it again, and it proceeded to download everything again.

It'd be great to not having to download the entire set every time. Even a simpleminded -

  • check if book already is downloaded and non-zero size (e.g. "A Tale of Two Cities.azw")
  • if not already existing, start downloading to tmp ("A Tale of Two Cities.tmp" or w/e)
  • if successfully downloaded mv tmp to final resting spot

Or heck, just toggle the write bit in the filesystem after a successful download if it's not too ham handed. Of course YMMV.

Tnx again!

@rsholmes
Copy link

There is the --startFromOffset option, but it seems not very useful because as far as I can tell, books don't necessarily have the same index on different runs. Books 1–100 this time aren't always the same as books 1–100 next time.

@treetrum
Copy link
Owner

Thanks for the suggestion — this is next up on my list to get done. Hopefully within the next 24h our so. Will likely be removing the startFromOffset option as well as I've experienced the same @rsholmes where the index is not consistent enough for this option to be useful anymore. (This is one of the oldest parameters of the tool from a few years ago, so I'm not surprised it no longer works as it originally did)

@treetrum treetrum added the enhancement New feature or request label Feb 19, 2025
@michaelsmanley
Copy link

I hacked together a method for doing this. Before fetching the book URL with a GET, I fetch it with a HEAD, which does include the content-length for the file. Then I look at the downloads folder and if I find the file there with the correct size already, I skip the GET. Otherwise, progress as usual.

I'll put up a PR. It is ugly but it worked and I managed to download 5600+ books with this script. Thank you for creating it!

@zenfish
Copy link
Author

zenfish commented Feb 20, 2025

I was wondering if HEAD would save the day, but I was writing from the car when I filed my RFE and didn't look at the source ;). Fine idea.

@eviltofu
Copy link

There is the --startFromOffset option, but it seems not very useful because as far as I can tell, books don't necessarily have the same index on different runs. Books 1–100 this time aren't always the same as books 1–100 next time.

If the list is sorted based on the title before downloading, would this make the list consistent?

@treetrum
Copy link
Owner

I have an inflight PR for not downloading files if they already exist available here if anyone would like to test: #183

I'm currently just using the same GET request and aborting it when we already have the file on disk to save one extra request, but I'll play around with using a HEAD request to see if there's an appreciable difference.

Thanks for the input so far!

@treetrum treetrum linked a pull request Feb 21, 2025 that will close this issue
@rsholmes
Copy link

rsholmes commented Feb 21, 2025

Seems to work for me, except that it finds only 200 books. The main branch version finds 1234 books.

I tried changing

    if (
      !data.GetContentOwnershipData.hasMoreItems ||
      allItems.length >= options.totalDownloads
    ) {

to

    if (
      data.GetContentOwnershipData.items.length < batchSize ||
      allItems.length >= options.totalDownloads
    ) {

in index.ts and it went back to finding 1234 books. All 1234 are now downloaded. Thanks!

@geetlord
Copy link

I had the same issue with hitting a 200 book limit. The above edit also seems to have fixed the issue for me

@michaelsmanley
Copy link

The hack I used to call HEAD before attempting GET is in #187

It's ugly but it worked for me to snag my 5600+ book library successfully.

@treetrum
Copy link
Owner

@rsholmes thanks for the tip! That does indeed seem to fix the problem. I've updated that PR with the fix (plus a few other improvements).

I'm looking for anyone willing to test that PR before I merge it though, so if anyone here is willing to try it out again and report back, it would be greatly appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants