Make `XRef_indexObjects` more robust against bad PDF files (issue 5752) #6375

Snuffleupagus · 2015-08-21T15:29:19Z

This patch improves the detection of xref in files where it is followed by an arbitrary whitespace character (not just a line-breaking char).
It also adds a check for missing whitespace, e.g. 1 0 obj<<, to speed up readToken for the PDF file in the referenced issue.
Finally, the patch also replaces a bunch of magic numbers with suitably named constants.

Fixes #5752.

Also improves #6243, but there are still issues.

yurydelendik · 2015-08-21T15:34:48Z

src/core/obj.js

          continue;
        }
        var token = readToken(buffer, position);
        var m;
-        if (token === 'xref') {
+        if (/^xref\b/.test(token)) { // 'xref'


We need to minimize use of regular expression here. Can you: check token[0] === 'x' before running regexp, or token.indexOf('xref') === 0 and then check token[3] for whitespace?

yurydelendik · 2015-08-21T15:42:50Z

Looks good, thanks.

This patch improves the detection of `xref` in files where it is followed by an arbitrary whitespace character (not just a line-breaking char). It also adds a check for missing whitespace, e.g. `1 0 obj<<`, to speed up `readToken` for the PDF file in the referenced issue. Finally, the patch also replaces a bunch of magic numbers with suitably named constants. Fixes 5752. Also improves 6243, but there are still issues.

Snuffleupagus · 2015-08-21T18:38:23Z

/botio test

pdfjsbot · 2015-08-21T18:38:24Z

From: Bot.io (Linux)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/c32eb649daaa3a4/output.txt

pdfjsbot · 2015-08-21T18:38:24Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/769c445ff04b177/output.txt

pdfjsbot · 2015-08-21T18:57:28Z

From: Bot.io (Windows)

Success

Full output at http://107.22.172.223:8877/769c445ff04b177/output.txt

Total script time: 19.07 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

pdfjsbot · 2015-08-21T18:57:39Z

From: Bot.io (Linux)

Success

Full output at http://107.21.233.14:8877/c32eb649daaa3a4/output.txt

Total script time: 19.25 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

yurydelendik · 2015-08-24T20:05:03Z

Thank you for the patch.

/botio makeref

pdfjsbot · 2015-08-24T20:05:03Z

From: Bot.io (Linux)

Received

Command cmd_makeref from @yurydelendik received. Current queue size: 0

Live output at: http://107.21.233.14:8877/2cc9ebfca847d25/output.txt

pdfjsbot · 2015-08-24T20:05:03Z

From: Bot.io (Windows)

Received

Command cmd_makeref from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/d8fecc92a41d404/output.txt

…ects Make `XRef_indexObjects` more robust against bad PDF files (issue 5752)

pdfjsbot · 2015-08-24T20:23:18Z

From: Bot.io (Windows)

Success

Full output at http://107.22.172.223:8877/d8fecc92a41d404/output.txt

Total script time: 18.24 mins

Lint: Passed
Make references: Passed
Check references: Passed

pdfjsbot · 2015-08-24T20:24:23Z

From: Bot.io (Linux)

Success

Full output at http://107.21.233.14:8877/2cc9ebfca847d25/output.txt

Total script time: 19.33 mins

Lint: Passed
Make references: Passed
Check references: Passed

Snuffleupagus added the core label Aug 21, 2015

yurydelendik reviewed Aug 21, 2015
View reviewed changes

yurydelendik added a commit that referenced this pull request Aug 24, 2015

Merge pull request #6375 from Snuffleupagus/more-robust-XRef_indexObj…

5dcd409

…ects Make `XRef_indexObjects` more robust against bad PDF files (issue 5752)

yurydelendik merged commit 5dcd409 into mozilla:master Aug 24, 2015

Snuffleupagus deleted the more-robust-XRef_indexObjects branch August 24, 2015 20:25

Snuffleupagus mentioned this pull request Sep 1, 2015

PDF renders incorrectly with error #6243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `XRef_indexObjects` more robust against bad PDF files (issue 5752) #6375

Make `XRef_indexObjects` more robust against bad PDF files (issue 5752) #6375

Snuffleupagus commented Aug 21, 2015

yurydelendik Aug 21, 2015

Snuffleupagus Aug 21, 2015

yurydelendik commented Aug 21, 2015

Snuffleupagus commented Aug 21, 2015

pdfjsbot commented Aug 21, 2015

pdfjsbot commented Aug 21, 2015

pdfjsbot commented Aug 21, 2015

pdfjsbot commented Aug 21, 2015

yurydelendik commented Aug 24, 2015

pdfjsbot commented Aug 24, 2015

pdfjsbot commented Aug 24, 2015

pdfjsbot commented Aug 24, 2015

pdfjsbot commented Aug 24, 2015

Make XRef_indexObjects more robust against bad PDF files (issue 5752) #6375

Make XRef_indexObjects more robust against bad PDF files (issue 5752) #6375

Conversation

Snuffleupagus commented Aug 21, 2015

yurydelendik Aug 21, 2015

Choose a reason for hiding this comment

Snuffleupagus Aug 21, 2015

Choose a reason for hiding this comment

yurydelendik commented Aug 21, 2015

Snuffleupagus commented Aug 21, 2015

pdfjsbot commented Aug 21, 2015

From: Bot.io (Linux)

Received

pdfjsbot commented Aug 21, 2015

From: Bot.io (Windows)

Received

pdfjsbot commented Aug 21, 2015

From: Bot.io (Windows)

Success

pdfjsbot commented Aug 21, 2015

From: Bot.io (Linux)

Success

yurydelendik commented Aug 24, 2015

pdfjsbot commented Aug 24, 2015

From: Bot.io (Linux)

Received

pdfjsbot commented Aug 24, 2015

From: Bot.io (Windows)

Received

pdfjsbot commented Aug 24, 2015

From: Bot.io (Windows)

Success

pdfjsbot commented Aug 24, 2015

From: Bot.io (Linux)

Success

Make `XRef_indexObjects` more robust against bad PDF files (issue 5752) #6375

Make `XRef_indexObjects` more robust against bad PDF files (issue 5752) #6375