Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat fonts with the same font descriptor and encoding as aliases #4423

Merged
merged 1 commit into from
Apr 9, 2014

Conversation

chriskr
Copy link
Contributor

@chriskr chriskr commented Mar 9, 2014

Fix for issue #4339

Different fonts can point to the same font descriptor
(see #4339 for details). With this
commit such fonts are treated as aliases if they have also the same encoding.
The according info is stored on the font descriptor. This change must also
ensure that aliases use always the same font name because translated fonts
can get cleared depending on the CLEANUP_TIMEOUT setting.

@bthorben
Copy link
Contributor

bthorben commented Mar 9, 2014

I see what the code does, but I can't relate it to CLEANUP_TIMEOUT (I guess you mean in viewer.js?!). Why does resetting the unfinished pages crash the application and why does this code solves this?

@chriskr
Copy link
Contributor Author

chriskr commented Mar 9, 2014

If the the CLEANUP_TIMEOUT triggers Catalog.prototype.cleanup gets called at some point. That deletes translated on all cached fonts before resetting the font cache completely. We have kept a pointer to the translated font on the font descriptor, but after cleanup that font has no longer a translated font and it throws accordingly when rendering tasks restart. With this change we integrate better in the current design. We only keep a ref to a font which has the translated version. If the cache gets reset that ref is just no longer valid, but that doesn't do any harm. Additionally we ensure that the fontName for a descriptor-encoding pair is only created once because if the font cache gets cleared we need to ensure that the font is created with the same name.

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

pdfjsbot commented Mar 9, 2014

From: Bot.io (Linux)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/eb4657f53990b6f/output.txt

@timvandermeij
Copy link
Contributor

@chriskr Could you rebase the PR?

@chriskr
Copy link
Contributor Author

chriskr commented Mar 13, 2014

It's rebased now.

@brendandahl
Copy link
Contributor

The toUnicode map of the font also affects how the font is translated, so if encoding is the same but toUnicode is different this solution will break. Looking at a few of the toUnicode maps for fonts that use the same DescendantFonts, the toUnicode maps are pointing to different objects(although the streams appear to have identical data). We'll need to run a checksum or something on the toUnicode and combine that with that encoding id key.

var fontID = fontRef.num + '_' + fontRef.gen;
if (encodingID && isDict(descriptor)) {
if (!descriptor.has('fontAliases')) {
descriptor.set('fontAliases', Object.create(null));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a solution that does not add properties to the dictionary. If we ever decide to add writing capabilities to pdf.js this would cause us to output this invalid property. We could use a refset instead https://github.com/mozilla/pdf.js/blob/master/src/core/obj.js#L172

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriskr
Copy link
Contributor Author

chriskr commented Mar 18, 2014

Are you sure that fonts which point to the same font descriptor and the same encoding can have different ToUnicode maps? That doesn't seem to make any sense? If fonts pointing to the same font descriptor but to different ToUnicode maps they basically also have to point to the same different encoding? It's hard to imagine that something else makes sense?

About not adding new properties to the font descriptor, will look into that.

@brendandahl
Copy link
Contributor

It's unlikely they'd have different toUnicode maps, the only reason I could see it happening is that each page would only map the char codes that are used on the page. If there's anything I've learned from fixing lots of pdfs, it's to expect the unexpected. The nasa pdf is already an example of a poorly made pdf since it repeats so many fonts that could just be the same object.

@chriskr
Copy link
Contributor Author

chriskr commented Mar 18, 2014

Ok. Do you think we can count on that they are referenced so that we can just take the reference num and gen? Or do you think we have to hash the whole map? Or we could just abort the alias check if there is a toUnicode map but it's not a reference? From the examples which i have seen e.g. on the Nasa document it seems to expensive to hash the whole map.

@brendandahl
Copy link
Contributor

From a few of the fonts I checked in the nasa document, just using the ref/gen would cause us to not cache the streams (same stream contents, but different streams).
http://brendandahl.github.io/pdf.js.utils/browser/index.html

@chriskr
Copy link
Contributor Author

chriskr commented Mar 18, 2014

Ok, will look into this.

@chriskr
Copy link
Contributor Author

chriskr commented Mar 26, 2014

Pushed a new suggestion. Font aliases are stored on the descriptor as fontAliases. It detects a bit less aliases on the NASA document (it seems some fonts are defined with a subset of a unicode map of the 'master' font). On the NASA document

  • 3560 fonts are returned as aliases
  • 48 fonts are translated
  • it has 24 different font files (based on a binary hash)
    That means it could perhaps still be improved with some more clever checking. But we haven't seen other documents with these kind of duplicated fonts. If that is a more common pattern we could look at it again.

@chriskr
Copy link
Contributor Author

chriskr commented Mar 26, 2014

Linting fails because of an intentional fall through in a switch statement.


switch (tailLength) {
case 3:
k1 ^= data[blockCounts * 4 + 2] << 16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add /* falls through */

@Snuffleupagus
Copy link
Collaborator

Also, it would be nice if you add the following mode lines at the top of "src/core/murmurhash3.js":

/* -*- Mode: Java; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
/* vim: set shiftwidth=2 tabstop=2 autoindent cindent expandtab: */

@timvandermeij
Copy link
Contributor

@chriskr Could you rebase this?

@chriskr
Copy link
Contributor Author

chriskr commented Apr 3, 2014

Rebased it.

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Apr 3, 2014

From: Bot.io (Linux)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/9336628ab4c29ad/output.txt

@timvandermeij
Copy link
Contributor

Windows test failed. Let's try again.

/botio-windows test

@pdfjsbot
Copy link

pdfjsbot commented Apr 3, 2014

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/c28d913e77aa9a1/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 3, 2014

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/9336628ab4c29ad/output.txt

Total script time: 25.57 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/9336628ab4c29ad/reftest-analyzer.html#web=eq.log

@yurydelendik
Copy link
Contributor

/botio-windows test

@pdfjsbot
Copy link

pdfjsbot commented Apr 3, 2014

From: Bot.io (Windows)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/dda8d743c589d60/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 4, 2014

From: Bot.io (Linux)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.21.233.14:8877/176c8b40b571f2c/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 4, 2014

From: Bot.io (Windows)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/d69d76b4d2c851d/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 4, 2014

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/d69d76b4d2c851d/output.txt

Total script time: 1.35 mins

  • Font tests: FAILED
  • Unit tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/d69d76b4d2c851d/reftest-analyzer.html#web=eq.log

@Snuffleupagus
Copy link
Collaborator

/botio-windows test

@pdfjsbot
Copy link

pdfjsbot commented Apr 4, 2014

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/6f4dfa3f178edb4/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 4, 2014

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/6f4dfa3f178edb4/output.txt

Total script time: 0.82 mins

  • Font tests: FAILED
  • Unit tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/6f4dfa3f178edb4/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Apr 4, 2014

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/176c8b40b571f2c/output.txt

Total script time: 25.86 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@brendandahl
Copy link
Contributor

/botio-windows test

@pdfjsbot
Copy link

pdfjsbot commented Apr 8, 2014

From: Bot.io (Windows)


Received

Command cmd_test from @brendandahl received. Current queue size: 0

Live output at: http://107.22.172.223:8877/817a4c1a139a11a/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 8, 2014

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/817a4c1a139a11a/output.txt

Total script time: 2.72 mins

  • Font tests: FAILED
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/817a4c1a139a11a/reftest-analyzer.html#web=eq.log

@yurydelendik
Copy link
Contributor

/botio-windows test

@pdfjsbot
Copy link

pdfjsbot commented Apr 8, 2014

From: Bot.io (Windows)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/4a219115633b65b/output.txt

@@ -52,6 +52,7 @@ var otherFiles = [
'core/jpx.js',
'core/jbig2.js',
'core/bidi.js',
'core/murmurhash3.js',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add this to WORKER_SRC_FILES in make.js as well.
An aside: who wants to add a build step to auto generate src lists from the make.js variable so we don't have to worry about this anymore? :)

@brendandahl
Copy link
Contributor

r+ pending the fix for my comment

@pdfjsbot
Copy link

pdfjsbot commented Apr 8, 2014

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/4a219115633b65b/output.txt

Total script time: 21.76 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Apr 8, 2014

From: Bot.io (Linux)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/1615e8074aabde7/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 8, 2014

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/ee5bfa9de944708/output.txt

…s aliases

Different fonts can point to the same font descriptor
(see mozilla#4339 for details). With this
commit such fonts are treated as aliases if they have also the same encoding
and the same toUnicode map. The according info is stored on the font descriptor.
This change must also ensure that aliases use always the same font name
because translated fonts can get cleared depending on the CLEANUP_TIMEOUT setting.
@chriskr
Copy link
Contributor Author

chriskr commented Apr 8, 2014

Rebased and fixed.

@pdfjsbot
Copy link

pdfjsbot commented Apr 8, 2014

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/ee5bfa9de944708/output.txt

Total script time: 21.83 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Apr 8, 2014

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/1615e8074aabde7/output.txt

Total script time: 25.93 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

brendandahl added a commit that referenced this pull request Apr 9, 2014
Treat fonts with the same font descriptor and encoding as aliases
@brendandahl brendandahl merged commit a6e5f31 into mozilla:master Apr 9, 2014
@brendandahl
Copy link
Contributor

Nice fix, thanks!

@chriskr chriskr deleted the font-aliases branch April 9, 2014 17:45
@chriskr
Copy link
Contributor Author

chriskr commented Apr 9, 2014

You're welcome! :)

@bthorben
Copy link
Contributor

bthorben commented Apr 9, 2014

Congratulations!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants