Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add element to text layer even if width === 0 #7130

Merged
merged 1 commit into from
Apr 22, 2016
Merged

Add element to text layer even if width === 0 #7130

merged 1 commit into from
Apr 22, 2016

Conversation

nschloe
Copy link
Contributor

@nschloe nschloe commented Mar 29, 2016

Some browsers render certain special characters with width 0, others with strictly positive width. (For example, the Greek Delta, Δ, has width 0 in Google Chrome, and a positive width in Firefox.) The if clause in operation so far results in different text layer DOM trees for different browsers.

This commit fixes that by adding the elements independently of their width.

For the background:
This bug was discovered when using https://github.com/timdown/rangy, a tool to create and find text ranges in a DOM tree, used primarily for adding annotations.


This change is Reviewable

Some browsers render certain special characters with width 0, others with strictly positive width. (For example, the Greek Delta, Δ, has width 0 in Google Chrome, and a positive width in Firefox.) The `if` clause in operation so far results in different text layer DOM trees for different browsers.

This commit fixes that by adding the elements independently of their width.
@yurydelendik
Copy link
Contributor

Can you provide test PDF? (For verification) It will be nice to include that as text layer test.

@nschloe
Copy link
Contributor Author

nschloe commented Mar 29, 2016

Thanks for the swift reply! I suppose any paper with math with do, but the bug was dicovered with http://arxiv.org/pdf/1603.04246v1.pdf.

@timvandermeij
Copy link
Contributor

I'm a bit confused about this PR. First of all, I wonder why the width > 0 check was added in the first place. Aren't we regressing something with this patch? Second of all, I cannot spot a difference in text selection before and after this patch with the provided document. Opening it using https://mozilla.github.io/pdf.js/web/viewer.html (latest master) and http://107.21.233.14:8877/f8c440b8fb693e9/web/viewer.html (this patch) makes it impossible to select the delta sign in both versions for me using Firefox 45.0.1 on Arch Linux. In comparison, Okular allows me to select all special characters in the PDF. Am I missing something here?

The provided file seems to have been made with LaTeX, so I'm attaching a reduced test case that I made myself that can be included in this patch if you can reproduce the issue with this file too:
delta.pdf

The LaTeX source, for those interested, is:

\documentclass{article}
\usepackage[a4paper,margin=2.5cm]{geometry}
\begin{document}
    \huge{$\Delta$}
\end{document}

@yurydelendik
Copy link
Contributor

I wonder why the width > 0 check was added in the first place

It was introduced by
e086cf3#diff-dd42e9b2d2a652b8e312efa567698aa6L1850 in-place of textLength > 1 check (and also to better protect from div by 0).

@nschloe
Copy link
Contributor Author

nschloe commented Mar 30, 2016

Thanks @timvandermeij for the comments.

(this patch) makes it impossible to select the delta sign in both versions for me using Firefox 45.0.1 on Arch Linux. In comparison, Okular allows me to select all special characters in the PDF. Am I missing something here?

The \Delta can indeed still not be selected. The purpose of this PR is to get equal text layers in both Chrome and Firefox. So far, Chrome:

<div style="left: 491.912px; top: 1323.33px; font-size: 16.6043px; font-family: monospace;">1</div>
<div class="endOfContent"></div>

Firefox:

<div style="left: 143.017px; top: 117.739px; font-size: 34.4375px; font-family: monospace;"></div>
<div style="left: 491.912px; top: 1323.33px; font-size: 16.6043px; font-family: monospace;">1</div>
<div class="endOfContent"></div>

(The symbol has width 0 in Chrome and is thus stripped on master.)

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/04f19be6c3ac3a2/output.txt

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/d08f109d8de2ff5/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/a5c104438626c30/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/d08f109d8de2ff5/output.txt

Total script time: 20.49 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/a5c104438626c30/output.txt

Total script time: 22.77 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@timvandermeij
Copy link
Contributor

@yurydelendik Could you review this PR?

@yurydelendik
Copy link
Contributor

I agree with the change, but it's too fragile. I see whitespace check above, so we somewhat safe from regressions in most of the cases. It might be hard to add a test case since it's different across browsers/platforms. Thank you for the patch.

@yurydelendik yurydelendik merged commit 3d49879 into mozilla:master Apr 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants