Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EnhanceTextSelection] Make expandTextDivs more efficient by updating all styles at once instead of piecewise #7632

Merged
merged 1 commit into from
Sep 15, 2016
Merged

Conversation

Snuffleupagus
Copy link
Collaborator

@Snuffleupagus Snuffleupagus commented Sep 14, 2016

I intended to provide proper benchmarking results here, as outlined in https://github.com/mozilla/pdf.js/wiki/Benchmarking-your-changes, but after wasting a couple of hours over the weekend getting weird results I gave up.
It appears that there's a lot of, i.e. way too much, variance between subsequent runs of text tests for the results to be meaningful.
(Previously I've only benchmarked eq tests, so I don't know if the text tests has never worked well or if it's a newer problem. For reference, please see the results of back-to-back benchmark runs on the current master with a very simple manifest file: https://gist.github.com/Snuffleupagus/eec3acc22f519eddce6404be883b6960.)

Instead I used console.time/timeEnd in appendText and expandTextDivs to be able to compare the performance with/without the patch. The entire viewer was (skip-cache) reloaded between measurements, and the result are available here: https://gist.github.com/Snuffleupagus/33fc971b653e6524bc889216ff95499c.
Given the troubles I've had with benchmarking, I've not yet computed any statistics on the results (e.g. mean, variance, confidence intervals, and so on).
However, just by looking at the data I think it's safe to say that this patch first of all doesn't seem to regress the current performance. Secondly it certainly looks very likely that this patch actually improves the performance, especially for the one-glyph-per-text-div case (cf. issue 7224).

Re: issue #7584.

/cc @timvandermeij


This change is Reviewable

@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/ef403cd52fad547/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/e96954c98dd3cb6/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/e96954c98dd3cb6/output.txt

Total script time: 24.43 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/ef403cd52fad547/output.txt

Total script time: 39.30 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/ef403cd52fad547/reftest-analyzer.html#web=eq.log

textDiv.style.fontFamily = style.fontFamily;
textDivProperties.style = 'left: ' + left + 'px; top: ' + top +
'px; font-size: ' + fontHeight +
'px; font-family: ' + style.fontFamily + ';';
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we might actually want to do something similar to PR #5033 for this string!?

Copy link
Contributor

@timvandermeij timvandermeij Sep 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'm fine with that, as long as we keep a similar comment above it (since otherwise it may look like magic that may be removed later on).

Maybe you could check how often this method is called for the one-glyph-per-div case to determine whether we actually need this optimization or not? If it's called over a thousand times, it should be useful, while otherwise it may be unnecessary. I'm fine with both ways, so let's pick the one you think is most useful here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a good case, such as the tracemonkey paper, there seem to be around 150 divs per page. So even for well behaved documents, it might make sense to avoid these intermediate strings when considering an entire document.

For a really bad case, such as the PDF from issue 7224, there's around 2000 divs per page, and in this case it probably makes a lot of sense to avoid intermediate strings.

I'll update the code, and add a comment about why we're doing this.

…ng all styles at once instead of piecewise

I intended to provide proper benchmarking results here, as outlined in https://github.com/mozilla/pdf.js/wiki/Benchmarking-your-changes, but after wasting a couple of hours over the weekend getting weird results I gave up.
It appears that there's a lot of, i.e. way too much, variance between subsequent runs of `text` tests for the results to be meaningful.
(Previously I've only benchmarked `eq` tests, so I don't know if the `text` tests has never worked well or if it's a newer problem. For reference, please see the results of back-to-back benchmark runs on the current `master` with a *very* simple manifest file: [link here].)

Instead I used `console.time/timeEnd` in `appendText` and `expandTextDivs` to be able to compare the performance with/without the patch. The entire viewer was (skip-cache) reloaded between measurements, and the result are available here: [link here].
Given the troubles I've had with benchmarking, I've not yet computed any statistics on the results (e.g. mean, variance, confidence intervals, and so on).
However, just by looking at the data I think it's safe to say that this patch first of all doesn't seem to regress the current performance. Secondly it certainly looks *very* likely that this patch actually improves the performance, especially for the one-glyph-per-text-div case (cf. issue 7224).

Re: issue 7584.
@Snuffleupagus
Copy link
Collaborator Author

Comments have been addressed.

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/95a4df768e50969/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/c56b1885962fbb2/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/c56b1885962fbb2/output.txt

Total script time: 24.59 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/95a4df768e50969/output.txt

Total script time: 39.52 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/95a4df768e50969/reftest-analyzer.html#web=eq.log

@Snuffleupagus
Copy link
Collaborator Author

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/2e4660abe7a0f40/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/2e4660abe7a0f40/output.txt

Total script time: 1.32 mins

Published

@timvandermeij
Copy link
Contributor

/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/8ec5f6a456954e4/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/533677ffa739ff2/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/8ec5f6a456954e4/output.txt

Total script time: 24.27 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/533677ffa739ff2/output.txt

Total script time: 38.15 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@timvandermeij timvandermeij merged commit 26da2d5 into mozilla:master Sep 15, 2016
@timvandermeij
Copy link
Contributor

Awesome work!

@Snuffleupagus Snuffleupagus deleted the more-efficient-expandTextDivs branch September 15, 2016 16:23
movsb pushed a commit to movsb/pdf.js that referenced this pull request Jul 14, 2018
…andTextDivs

[EnhanceTextSelection] Make `expandTextDivs` more efficient by updating all styles at once instead of piecewise
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants