-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make XRef_indexObjects
more robust against bad PDF files (issue 5752)
#6375
Make XRef_indexObjects
more robust against bad PDF files (issue 5752)
#6375
Conversation
continue; | ||
} | ||
var token = readToken(buffer, position); | ||
var m; | ||
if (token === 'xref') { | ||
if (/^xref\b/.test(token)) { // 'xref' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to minimize use of regular expression here. Can you: check token[0] === 'x'
before running regexp, or token.indexOf('xref') === 0
and then check token[3] for whitespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Looks good, thanks. |
This patch improves the detection of `xref` in files where it is followed by an arbitrary whitespace character (not just a line-breaking char). It also adds a check for missing whitespace, e.g. `1 0 obj<<`, to speed up `readToken` for the PDF file in the referenced issue. Finally, the patch also replaces a bunch of magic numbers with suitably named constants. Fixes 5752. Also improves 6243, but there are still issues.
/botio test |
From: Bot.io (Linux)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://107.21.233.14:8877/c32eb649daaa3a4/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://107.22.172.223:8877/769c445ff04b177/output.txt |
From: Bot.io (Windows)SuccessFull output at http://107.22.172.223:8877/769c445ff04b177/output.txt Total script time: 19.07 mins
|
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/c32eb649daaa3a4/output.txt Total script time: 19.25 mins
|
Thank you for the patch. /botio makeref |
From: Bot.io (Linux)ReceivedCommand cmd_makeref from @yurydelendik received. Current queue size: 0 Live output at: http://107.21.233.14:8877/2cc9ebfca847d25/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @yurydelendik received. Current queue size: 0 Live output at: http://107.22.172.223:8877/d8fecc92a41d404/output.txt |
…ects Make `XRef_indexObjects` more robust against bad PDF files (issue 5752)
From: Bot.io (Windows)SuccessFull output at http://107.22.172.223:8877/d8fecc92a41d404/output.txt Total script time: 18.24 mins
|
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/2cc9ebfca847d25/output.txt Total script time: 19.33 mins
|
This patch improves the detection of
xref
in files where it is followed by an arbitrary whitespace character (not just a line-breaking char).It also adds a check for missing whitespace, e.g.
1 0 obj<<
, to speed upreadToken
for the PDF file in the referenced issue.Finally, the patch also replaces a bunch of magic numbers with suitably named constants.
Fixes #5752.
Also improves #6243, but there are still issues.