Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to ignore multiple identical Tf (setFont) commands in PartialEvaluator_getTextContent (issue 5808) #7387

Merged
merged 1 commit into from
Aug 30, 2016

Conversation

Snuffleupagus
Copy link
Collaborator

This patch improves the performance of issue #5808, but I'm not sure if it's enough to call it fixed. On average, this patch reduces the number of textLayer div's by a factor of 3, and it also reduces the time spend in getTextContent by a factor of ~2.

The PDF file is generated by Scribus PDF, which for reasons I cannot understand is placing redundant Tf commands before every showText command.
Note how the PDF file also contains lots of (basically) identical fonts, but with slightly different names, which causes unnecessary font-switching. This causes some unnecessary breaking of textLayer div's, but this issue cannot be easily worked around.

/cc @timvandermeij

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

pdfjsbot commented Jun 3, 2016

From: Bot.io (Linux)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/7df65102aa1c333/output.txt

@timvandermeij
Copy link
Contributor

Looks really good to me. I also wouldn't call it fixed entirely, but it's a large step in the right direction. I think this will benefit other Scribus-generated PDF files too that we do not know of.

@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Jun 4, 2016

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/58a616c15cfd60f/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Jun 4, 2016

From: Bot.io (Linux)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/bd6618735d5bc20/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Jun 4, 2016

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/58a616c15cfd60f/output.txt

Total script time: 21.15 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Jun 4, 2016

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/bd6618735d5bc20/output.txt

Total script time: 27.68 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

…lEvaluator_getTextContent` (issue 5808)

This patch improves the performance of issue 5808, but I'm not sure if it's enough to call it fixed. On average, this patch reduces the number of textLayer div's by a factor of 3, and it also reduces the time spend in `getTextContent` by a factor of ~2.

The PDF file is generated by `Scribus PDF`, which for reasons I cannot understand is placing redundant `Tf` commands before *every* showText command.
Note how the PDF file also contains lots of (basically) identical fonts, but with slightly different names, which causes unnecessary font-switching. This causes some unnecessary breaking of textLayer div's, but this issue cannot be easily worked around.
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/53d1ce9ac6378ba/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/8a729af834d872e/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/8a729af834d872e/output.txt

Total script time: 23.07 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/53d1ce9ac6378ba/output.txt

Total script time: 28.64 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@yurydelendik
Copy link
Contributor

r+

@yurydelendik
Copy link
Contributor

/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/0151ecc1ea14a1f/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_makeref from @yurydelendik received. Current queue size: 0

Live output at: http://107.21.233.14:8877/31194223efbc016/output.txt

@yurydelendik yurydelendik merged commit ffa9939 into mozilla:master Aug 30, 2016
@yurydelendik
Copy link
Contributor

Thank you for the patch

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/0151ecc1ea14a1f/output.txt

Total script time: 23.30 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/31194223efbc016/output.txt

Total script time: 27.75 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@Snuffleupagus Snuffleupagus deleted the issue-5808 branch August 30, 2016 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants