Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional heuristics to recognize unknown glyphs for toUnicode (bug 1027533) #4980

Merged
merged 1 commit into from
Jun 24, 2014
Merged

Additional heuristics to recognize unknown glyphs for toUnicode (bug 1027533) #4980

merged 1 commit into from
Jun 24, 2014

Conversation

Snuffleupagus
Copy link
Collaborator

With the introduction of adjustMapping in PR #4259, for certain fonts we are now a lot more dependent on the existence (and correctness) of toUnicode; see e.g. fonts.js#L2479.

For the PDF file referenced in the bug, the toUnicode entries are of the form g00xx, so I've simply extended the heuristics to deal with that particular case of bad Unicode data.

Note: This patch still doesn't really fix copying. It's now possible to select the text, but you only get junk when copying (however this is also the case in Adobe Reader).

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1027533.

@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/84b11dade773f3f/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/800c23702a76648/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/84b11dade773f3f/output.txt

Total script time: 21.22 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/800c23702a76648/output.txt

Total script time: 23.76 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

yurydelendik added a commit that referenced this pull request Jun 24, 2014
Additional heuristics to recognize unknown glyphs for toUnicode (bug 1027533)
@yurydelendik yurydelendik merged commit 10db93b into mozilla:master Jun 24, 2014
@yurydelendik
Copy link
Contributor

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants