-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid getting stuck in empty nodes in the Pages tree when calling |Catalog_getPageDict| (issue 5644) #5655
Conversation
…talog_getPageDict| (issue 5644)
/botio test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://107.22.172.223:8877/c6bdd383996e251/output.txt |
From: Bot.io (Linux)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://107.21.233.14:8877/77970bc91c955d0/output.txt |
From: Bot.io (Windows)SuccessFull output at http://107.22.172.223:8877/c6bdd383996e251/output.txt Total script time: 17.52 mins
|
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/77970bc91c955d0/output.txt Total script time: 22.52 mins
|
/botio makeref |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @brendandahl received. Current queue size: 0 Live output at: http://107.22.172.223:8877/7ac9fda05041cc6/output.txt |
From: Bot.io (Linux)ReceivedCommand cmd_makeref from @brendandahl received. Current queue size: 0 Live output at: http://107.21.233.14:8877/352f1b0cbaa9d80/output.txt |
From: Bot.io (Windows)SuccessFull output at http://107.22.172.223:8877/7ac9fda05041cc6/output.txt Total script time: 18.43 mins
|
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/352f1b0cbaa9d80/output.txt Total script time: 22.72 mins
|
@brendandahl Good to merge, right? |
Avoid getting stuck in empty nodes in the Pages tree when calling |Catalog_getPageDict| (issue 5644)
This is a tentative patch that fixes #5644.
The issue with the referenced PDF file is that the
Pages
dictionary contains a number of emptyKids
, and we were getting stuck in those nodes before reaching the actual pages.As far as I can tell this code is just an optimization, to avoid what is usually a number of unnecessary iterations, but isn't strictly necessary. The only way I could find to solve the particular issue referenced, and to avoid similar issues in the future, was to remove that code path. This might mean that the lookup becomes a tiny bit slower in very large files, but I unfortunately found no other way to fix the issue (hence why I'm labelling this as a tentative patch).Edit: Removing this optimization, as the first version of the patch did, would be bad for performance reasons (especially with ranged loading). Hence I've submitted a new, and hopefully better, version of the patch that only checks all
Kids
if we've actually encountered an empty node.I've checked a large number of the PDF files in the test suite, especially the longer ones (where the performance penalty would be especially bad), and didn't find any file where the special case introduced in this patch was actually hit.
Note: I'm adding an
eq
test, to ensure that we actually find the correct page for eachpageIndex
.