Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmaps: Use cmap.forEach instead of Array.forEach #6329

Merged
merged 1 commit into from
Aug 10, 2015

Conversation

Rob--W
Copy link
Member

@Rob--W Rob--W commented Aug 7, 2015

CMaps may be sparse. forEach is terribly slow in Chrome (and also in
Firefox) when the sparse array contains a key with a high value. E.g.

console.time('forEach sparse')
var a = [];
a[0xFFFFFF] = 1;
a.forEach(function(){});
console.timeEnd('forEach sparse');

// Chrome: 2890ms
// Firefox: 1345ms

@yurydelendik
Copy link
Contributor

What is our worst case scenario from the test files in terms of cmapKeys length?

@Rob--W
Copy link
Member Author

Rob--W commented Aug 8, 2015

Which test files? I was crafting a PDF+cmap for a bug report (not PDF.js)
and discovered that it took ages before the file was displayed, while the
CPU was under high load. The code snippet above shows the specific
issue/bottleneck.

If I'm not mistaken, UTF16 can consist of 4 bytes, so the upper bound of
valid values is 0xFFFFFFFF. The length of the cmapKeys array can be high,
but each cmap entry requires storage space in the cmap / PDF file, so
that's not something to worry about.
On Aug 8, 2015 1:48 AM, "Yury Delendik" [email protected] wrote:

What is our worst case scenario from the test files in terms of cmapKeys
length?


Reply to this email directly or view it on GitHub
#6329 (comment).

var cmapKeys = Object.keys(cmap);
for (var i = 0, ii = cmapKeys.length; i < ii; ++i) {
var charCode = cmapKeys[i];
var token = cmapKeys[charCode];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be cmap[charCode]? Currently https://cdn.gonitro.com/img/primo/PrimoPDF_V5_User_Guide.pdf breaks with this patch and I think it is due to this line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Fixed.

@Snuffleupagus
Copy link
Collaborator

I'm wondering if it wouldn't make more sense to re-factor the code to instead use the CMap.forEach method, e.g. something along these lines:

diff --git a/src/core/evaluator.js b/src/core/evaluator.js
index ed137f8..9685a91 100644
--- a/src/core/evaluator.js
+++ b/src/core/evaluator.js
@@ -1370,11 +1370,11 @@ var PartialEvaluator = (function PartialEvaluatorClosure() {
         if (cmap instanceof IdentityCMap) {
           return new IdentityToUnicodeMap(0, 0xFFFF);
         }
-        cmap = cmap.getMap();
+        var map = [];
         // Convert UTF-16BE
         // NOTE: cmap can be a sparse array, so use forEach instead of for(;;)
         // to iterate over all keys.
-        cmap.forEach(function(token, i) {
+        cmap.forEach(function(i, token) {
           var str = [];
           for (var k = 0; k < token.length; k += 2) {
             var w1 = (token.charCodeAt(k) << 8) | token.charCodeAt(k + 1);
@@ -1386,9 +1386,9 @@ var PartialEvaluator = (function PartialEvaluatorClosure() {
             var w2 = (token.charCodeAt(k) << 8) | token.charCodeAt(k + 1);
             str.push(((w1 & 0x3ff) << 10) + (w2 & 0x3ff) + 0x10000);
           }
-          cmap[i] = String.fromCharCode.apply(String, str);
+          map[i] = String.fromCharCode.apply(String, str);
         });
-        return new ToUnicodeMap(cmap);
+        return new ToUnicodeMap(map);
       }
       return null;
     },

CMaps may be sparse. Array.prototype.forEach is terribly slow in Chrome
(and also in Firefox) when the sparse array contains a key with a high
value. E.g.

    console.time('forEach sparse')
    var a = [];
    a[0xFFFFFF] = 1;
    a.forEach(function(){});
    console.timeEnd('forEach sparse');

    // Chrome: 2890ms
    // Firefox: 1345ms

Switching to CMap.prototype.forEach, which is optimized for such
scenarios fixes the problem.
@Rob--W
Copy link
Member Author

Rob--W commented Aug 8, 2015

@Snuffleupagus Nifty function. I've updated the PR again.

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

pdfjsbot commented Aug 8, 2015

From: Bot.io (Linux)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/9afea67b9ed2dcb/output.txt

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Aug 8, 2015

From: Bot.io (Linux)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/49c8654244fb67b/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Aug 8, 2015

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/d65bf0f6f6de518/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Aug 8, 2015

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/d65bf0f6f6de518/output.txt

Total script time: 18.34 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Aug 8, 2015

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/49c8654244fb67b/output.txt

Total script time: 19.15 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@CodingFabian
Copy link
Contributor

looks like a neat patch, but afaik the commit message does not hold anymore :)

@Rob--W Rob--W changed the title cmaps: Use Object.keys + for loop instead of .forEach cmaps: Use cmap.forEach instead of Array.forEach Aug 9, 2015
@Rob--W
Copy link
Member Author

Rob--W commented Aug 9, 2015

@CodingFabian The commit message was already updated. I've just edited the title of the PR to match the latest changes.

@CodingFabian
Copy link
Contributor

ah cool, thank you that's what I meant :)

Snuffleupagus added a commit that referenced this pull request Aug 10, 2015
cmaps: Use cmap.forEach instead of Array.forEach
@Snuffleupagus Snuffleupagus merged commit 00b798d into mozilla:master Aug 10, 2015
@Snuffleupagus
Copy link
Collaborator

Looks good, thank you for the patch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants