-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Treat fonts with the same font descriptor and encoding as aliases #4423
Conversation
I see what the code does, but I can't relate it to CLEANUP_TIMEOUT (I guess you mean in viewer.js?!). Why does resetting the unfinished pages crash the application and why does this code solves this? |
If the the CLEANUP_TIMEOUT triggers Catalog.prototype.cleanup gets called at some point. That deletes translated on all cached fonts before resetting the font cache completely. We have kept a pointer to the translated font on the font descriptor, but after cleanup that font has no longer a translated font and it throws accordingly when rendering tasks restart. With this change we integrate better in the current design. We only keep a ref to a font which has the translated version. If the cache gets reset that ref is just no longer valid, but that doesn't do any harm. Additionally we ensure that the fontName for a descriptor-encoding pair is only created once because if the font cache gets cleared we need to ensure that the font is created with the same name. |
/botio-linux preview |
From: Bot.io (Linux)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/eb4657f53990b6f/output.txt |
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/eb4657f53990b6f/output.txt Total script time: 0.35 mins Published
|
@chriskr Could you rebase the PR? |
It's rebased now. |
The toUnicode map of the font also affects how the font is translated, so if encoding is the same but toUnicode is different this solution will break. Looking at a few of the toUnicode maps for fonts that use the same DescendantFonts, the toUnicode maps are pointing to different objects(although the streams appear to have identical data). We'll need to run a checksum or something on the toUnicode and combine that with that encoding id key. |
var fontID = fontRef.num + '_' + fontRef.gen; | ||
if (encodingID && isDict(descriptor)) { | ||
if (!descriptor.has('fontAliases')) { | ||
descriptor.set('fontAliases', Object.create(null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer a solution that does not add properties to the dictionary. If we ever decide to add writing capabilities to pdf.js this would cause us to output this invalid property. We could use a refset instead https://github.com/mozilla/pdf.js/blob/master/src/core/obj.js#L172
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that fonts which point to the same font descriptor and the same encoding can have different ToUnicode maps? That doesn't seem to make any sense? If fonts pointing to the same font descriptor but to different ToUnicode maps they basically also have to point to the same different encoding? It's hard to imagine that something else makes sense? About not adding new properties to the font descriptor, will look into that. |
It's unlikely they'd have different toUnicode maps, the only reason I could see it happening is that each page would only map the char codes that are used on the page. If there's anything I've learned from fixing lots of pdfs, it's to expect the unexpected. The nasa pdf is already an example of a poorly made pdf since it repeats so many fonts that could just be the same object. |
Ok. Do you think we can count on that they are referenced so that we can just take the reference num and gen? Or do you think we have to hash the whole map? Or we could just abort the alias check if there is a toUnicode map but it's not a reference? From the examples which i have seen e.g. on the Nasa document it seems to expensive to hash the whole map. |
From a few of the fonts I checked in the nasa document, just using the ref/gen would cause us to not cache the streams (same stream contents, but different streams). |
Ok, will look into this. |
Pushed a new suggestion. Font aliases are stored on the descriptor as fontAliases. It detects a bit less aliases on the NASA document (it seems some fonts are defined with a subset of a unicode map of the 'master' font). On the NASA document
|
Linting fails because of an intentional fall through in a switch statement. |
|
||
switch (tailLength) { | ||
case 3: | ||
k1 ^= data[blockCounts * 4 + 2] << 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add /* falls through */
Also, it would be nice if you add the following mode lines at the top of "src/core/murmurhash3.js":
|
@chriskr Could you rebase this? |
Rebased it. |
/botio test |
From: Bot.io (Linux)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/9336628ab4c29ad/output.txt |
Windows test failed. Let's try again. /botio-windows test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://107.22.172.223:8877/c28d913e77aa9a1/output.txt |
From: Bot.io (Linux)FailedFull output at http://107.21.233.14:8877/9336628ab4c29ad/output.txt Total script time: 25.57 mins
Image differences available at: http://107.21.233.14:8877/9336628ab4c29ad/reftest-analyzer.html#web=eq.log |
/botio-windows test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @yurydelendik received. Current queue size: 0 Live output at: http://107.22.172.223:8877/dda8d743c589d60/output.txt |
From: Bot.io (Linux)ReceivedCommand cmd_test from @yurydelendik received. Current queue size: 0 Live output at: http://107.21.233.14:8877/176c8b40b571f2c/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @yurydelendik received. Current queue size: 0 Live output at: http://107.22.172.223:8877/d69d76b4d2c851d/output.txt |
From: Bot.io (Windows)FailedFull output at http://107.22.172.223:8877/d69d76b4d2c851d/output.txt Total script time: 1.35 mins
Image differences available at: http://107.22.172.223:8877/d69d76b4d2c851d/reftest-analyzer.html#web=eq.log |
/botio-windows test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://107.22.172.223:8877/6f4dfa3f178edb4/output.txt |
From: Bot.io (Windows)FailedFull output at http://107.22.172.223:8877/6f4dfa3f178edb4/output.txt Total script time: 0.82 mins
Image differences available at: http://107.22.172.223:8877/6f4dfa3f178edb4/reftest-analyzer.html#web=eq.log |
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/176c8b40b571f2c/output.txt Total script time: 25.86 mins
|
/botio-windows test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @brendandahl received. Current queue size: 0 Live output at: http://107.22.172.223:8877/817a4c1a139a11a/output.txt |
From: Bot.io (Windows)FailedFull output at http://107.22.172.223:8877/817a4c1a139a11a/output.txt Total script time: 2.72 mins
Image differences available at: http://107.22.172.223:8877/817a4c1a139a11a/reftest-analyzer.html#web=eq.log |
/botio-windows test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @yurydelendik received. Current queue size: 0 Live output at: http://107.22.172.223:8877/4a219115633b65b/output.txt |
@@ -52,6 +52,7 @@ var otherFiles = [ | |||
'core/jpx.js', | |||
'core/jbig2.js', | |||
'core/bidi.js', | |||
'core/murmurhash3.js', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add this to WORKER_SRC_FILES in make.js as well.
An aside: who wants to add a build step to auto generate src lists from the make.js variable so we don't have to worry about this anymore? :)
r+ pending the fix for my comment |
From: Bot.io (Windows)SuccessFull output at http://107.22.172.223:8877/4a219115633b65b/output.txt Total script time: 21.76 mins
|
/botio test |
From: Bot.io (Linux)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/1615e8074aabde7/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://107.22.172.223:8877/ee5bfa9de944708/output.txt |
…s aliases Different fonts can point to the same font descriptor (see mozilla#4339 for details). With this commit such fonts are treated as aliases if they have also the same encoding and the same toUnicode map. The according info is stored on the font descriptor. This change must also ensure that aliases use always the same font name because translated fonts can get cleared depending on the CLEANUP_TIMEOUT setting.
Rebased and fixed. |
From: Bot.io (Windows)SuccessFull output at http://107.22.172.223:8877/ee5bfa9de944708/output.txt Total script time: 21.83 mins
|
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/1615e8074aabde7/output.txt Total script time: 25.93 mins
|
Treat fonts with the same font descriptor and encoding as aliases
Nice fix, thanks! |
You're welcome! :) |
Congratulations! |
Fix for issue #4339
Different fonts can point to the same font descriptor
(see #4339 for details). With this
commit such fonts are treated as aliases if they have also the same encoding.
The according info is stored on the font descriptor. This change must also
ensure that aliases use always the same font name because translated fonts
can get cleared depending on the CLEANUP_TIMEOUT setting.