Avoid getting stuck in empty nodes in the Pages tree when calling |Catalog_getPageDict| (issue 5644) #5655

Snuffleupagus · 2015-01-17T12:26:40Z

This is a tentative patch that fixes #5644.

The issue with the referenced PDF file is that the Pages dictionary contains a number of empty Kids, and we were getting stuck in those nodes before reaching the actual pages.

As far as I can tell this code is just an optimization, to avoid what is usually a number of unnecessary iterations, but isn't strictly necessary. The only way I could find to solve the particular issue referenced, and to avoid similar issues in the future, was to remove that code path. This might mean that the lookup becomes a tiny bit slower in very large files, but I unfortunately found no other way to fix the issue (hence why I'm labelling this as a tentative patch).

Edit: Removing this optimization, as the first version of the patch did, would be bad for performance reasons (especially with ranged loading). Hence I've submitted a new, and hopefully better, version of the patch that only checks all Kids if we've actually encountered an empty node.

I've checked a large number of the PDF files in the test suite, especially the longer ones (where the performance penalty would be especially bad), and didn't find any file where the special case introduced in this patch was actually hit.

Note: I'm adding an eq test, to ensure that we actually find the correct page for each pageIndex.

…talog_getPageDict| (issue 5644)

Snuffleupagus · 2015-03-27T17:21:07Z

/botio test

pdfjsbot · 2015-03-27T17:21:08Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/c6bdd383996e251/output.txt

pdfjsbot · 2015-03-27T17:21:09Z

From: Bot.io (Linux)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/77970bc91c955d0/output.txt

pdfjsbot · 2015-03-27T17:38:40Z

From: Bot.io (Windows)

Success

Full output at http://107.22.172.223:8877/c6bdd383996e251/output.txt

Total script time: 17.52 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

pdfjsbot · 2015-03-27T17:43:40Z

From: Bot.io (Linux)

Success

Full output at http://107.21.233.14:8877/77970bc91c955d0/output.txt

Total script time: 22.52 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

brendandahl · 2015-04-17T22:19:24Z

/botio makeref

pdfjsbot · 2015-04-17T22:19:24Z

From: Bot.io (Windows)

Received

Command cmd_makeref from @brendandahl received. Current queue size: 0

Live output at: http://107.22.172.223:8877/7ac9fda05041cc6/output.txt

pdfjsbot · 2015-04-17T22:19:24Z

From: Bot.io (Linux)

Received

Command cmd_makeref from @brendandahl received. Current queue size: 0

Live output at: http://107.21.233.14:8877/352f1b0cbaa9d80/output.txt

pdfjsbot · 2015-04-17T22:37:50Z

From: Bot.io (Windows)

Success

Full output at http://107.22.172.223:8877/7ac9fda05041cc6/output.txt

Total script time: 18.43 mins

Lint: Passed
Make references: Passed
Check references: Passed

pdfjsbot · 2015-04-17T22:42:07Z

From: Bot.io (Linux)

Success

Full output at http://107.21.233.14:8877/352f1b0cbaa9d80/output.txt

Total script time: 22.72 mins

Lint: Passed
Make references: Passed
Check references: Passed

timvandermeij · 2015-04-18T11:34:59Z

@brendandahl Good to merge, right?

Avoid getting stuck in empty nodes in the Pages tree when calling |Catalog_getPageDict| (issue 5644)

Snuffleupagus added other core and removed other labels Jan 17, 2015

Snuffleupagus added this to the 2015 Q1 milestone Jan 17, 2015

Avoid getting stuck in empty nodes in the Pages tree when calling |Ca…

888cbe0

…talog_getPageDict| (issue 5644)

Snuffleupagus assigned brendandahl Feb 22, 2015

brendandahl added a commit that referenced this pull request Apr 20, 2015

Merge pull request #5655 from Snuffleupagus/issue-5644

846eb96

Avoid getting stuck in empty nodes in the Pages tree when calling |Catalog_getPageDict| (issue 5644)

brendandahl merged commit 846eb96 into mozilla:master Apr 20, 2015

Snuffleupagus deleted the issue-5644 branch April 20, 2015 19:16

Snuffleupagus mentioned this pull request Feb 20, 2017

Error when I try to view a pdf ( uncaught exception: Page index 0 not found. ) #8088

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid getting stuck in empty nodes in the Pages tree when calling |Catalog_getPageDict| (issue 5644) #5655

Avoid getting stuck in empty nodes in the Pages tree when calling |Catalog_getPageDict| (issue 5644) #5655

Snuffleupagus commented Jan 17, 2015

Snuffleupagus commented Mar 27, 2015

pdfjsbot commented Mar 27, 2015

pdfjsbot commented Mar 27, 2015

pdfjsbot commented Mar 27, 2015

pdfjsbot commented Mar 27, 2015

brendandahl commented Apr 17, 2015

pdfjsbot commented Apr 17, 2015

pdfjsbot commented Apr 17, 2015

pdfjsbot commented Apr 17, 2015

pdfjsbot commented Apr 17, 2015

timvandermeij commented Apr 18, 2015

Avoid getting stuck in empty nodes in the Pages tree when calling |Catalog_getPageDict| (issue 5644) #5655

Avoid getting stuck in empty nodes in the Pages tree when calling |Catalog_getPageDict| (issue 5644) #5655

Conversation

Snuffleupagus commented Jan 17, 2015

Snuffleupagus commented Mar 27, 2015

pdfjsbot commented Mar 27, 2015

From: Bot.io (Windows)

Received

pdfjsbot commented Mar 27, 2015

From: Bot.io (Linux)

Received

pdfjsbot commented Mar 27, 2015

From: Bot.io (Windows)

Success

pdfjsbot commented Mar 27, 2015

From: Bot.io (Linux)

Success

brendandahl commented Apr 17, 2015

pdfjsbot commented Apr 17, 2015

From: Bot.io (Windows)

Received

pdfjsbot commented Apr 17, 2015

From: Bot.io (Linux)

Received

pdfjsbot commented Apr 17, 2015

From: Bot.io (Windows)

Success

pdfjsbot commented Apr 17, 2015

From: Bot.io (Linux)

Success

timvandermeij commented Apr 18, 2015