Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make XRef_indexObjects more robust against bad PDF files (issue 5752) #6375

Merged
merged 1 commit into from
Aug 24, 2015
Merged

Make XRef_indexObjects more robust against bad PDF files (issue 5752) #6375

merged 1 commit into from
Aug 24, 2015

Conversation

Snuffleupagus
Copy link
Collaborator

This patch improves the detection of xref in files where it is followed by an arbitrary whitespace character (not just a line-breaking char).
It also adds a check for missing whitespace, e.g. 1 0 obj<<, to speed up readToken for the PDF file in the referenced issue.
Finally, the patch also replaces a bunch of magic numbers with suitably named constants.

Fixes #5752.

Also improves #6243, but there are still issues.

continue;
}
var token = readToken(buffer, position);
var m;
if (token === 'xref') {
if (/^xref\b/.test(token)) { // 'xref'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to minimize use of regular expression here. Can you: check token[0] === 'x' before running regexp, or token.indexOf('xref') === 0 and then check token[3] for whitespace?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@yurydelendik
Copy link
Contributor

Looks good, thanks.

This patch improves the detection of `xref` in files where it is followed by an arbitrary whitespace character (not just a line-breaking char).
It also adds a check for missing whitespace, e.g. `1 0 obj<<`, to speed up `readToken` for the PDF file in the referenced issue.
Finally, the patch also replaces a bunch of magic numbers with suitably named constants.

Fixes 5752.

Also improves 6243, but there are still issues.
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/c32eb649daaa3a4/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/769c445ff04b177/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/769c445ff04b177/output.txt

Total script time: 19.07 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/c32eb649daaa3a4/output.txt

Total script time: 19.25 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@yurydelendik
Copy link
Contributor

Thank you for the patch.

/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_makeref from @yurydelendik received. Current queue size: 0

Live output at: http://107.21.233.14:8877/2cc9ebfca847d25/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/d8fecc92a41d404/output.txt

yurydelendik added a commit that referenced this pull request Aug 24, 2015
…ects

Make `XRef_indexObjects` more robust against bad PDF files (issue 5752)
@yurydelendik yurydelendik merged commit 5dcd409 into mozilla:master Aug 24, 2015
@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/d8fecc92a41d404/output.txt

Total script time: 18.24 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/2cc9ebfca847d25/output.txt

Total script time: 19.33 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@Snuffleupagus Snuffleupagus deleted the more-robust-XRef_indexObjects branch August 24, 2015 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants