-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text selection #738
Text selection #738
Conversation
Wow, this PR addresses one of the oldest issues we have (#10)! :) |
Conflicts: src/core.js
The reason it's not working on Chrome is Are we waiting for Chrome to support something like our |
currentTransform isn't available in webkit/chrome yet. It just shipped in ff7 so we need to propose it to whatwg, when one of us has time. |
Conflicts: src/core.js
Conflicts: src/fonts.js
textLayer.appendChild(textDivs[i]); | ||
// Adjust div width (via letterSpacing) to match canvas text | ||
// Due to the .offsetWidth calls, this is slow | ||
textDivs[i].style.letterSpacing = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering, if we have to do this for all divs. It seems like the spacing values are very small (0.1...) and removing it makes the test go wrong a little bit, but it's not that much noticeable as long as the errors don't accumilate too much. Having a little bit wrong selection might be bettern then beging 100% correct but therefore a lot slower?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the difference is sometimes enough for the highlight to "miss" a whole word. It might look weird for the end user that they're selecting a column of text and the highlight doesn't correspond to the words they want.
Incidentally, this "align-at-all-costs" hack fortuitously fixes the letter spacing in cases where the DOM-loaded font has missing/messed up spacing information (try commenting out the above line in f1040.pdf
for example). This might be a bug in our font loader/decoder algo though, not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify: the slow call to .offsetWidth()
is done lazily, so that it only fires up after the UI has settled on a given page. In my experience that seems to avoid the UI hiccups. Please let me know if you've noticed any UI performance hit.
I played around with this branch and it's quite impressive, how good this works. Artur mentioned all the things I had to "complain" about in the text on the pull request on top. @arturadib, good work. There are some details I'm not sure if they are good the way they are right now, but I'm running little on time right now. Hope to take a closer look later. |
@notmasteryet The encoding patch fixed Somehow however the font in Any ideas of what might be going on? Thanks! |
@jviereck Hey thanks for taking a peek in here! Let me know if you have any other suggestions or comments. |
In http://thefitdoc.com/page28/coaching/files/P90X%20Fit%20Test.pdf some text regions are displayed in wrong locations. |
In tracemonkey it seems all the ligatures are messed up. Are we hoping that the unicode stuff that Yury's is working will fix this? So in terms of landing this what do we need to do?
As far as search and text selection. I wonder if we should try and play around with making a super lightweight partial evaluator that will only emit text and all the location/size specifics. I'd be interested to see how poppler does text search. Not really an important issue at the moment, but if you want a really fun one http://ruby-pdf.rubyforge.org/pdf-writer/demos/demo.pdf Could we perhaps support rotation by using the new css3 transform: rotation stuff? |
Wasn't @digitarald working on making things CSS3 accelerated?
Have you seen my comment here: #819 (comment), issue #819? |
blocked by #829 |
…lect Conflicts: src/canvas.js web/viewer.js
blocked by #838 unicode normalization can be added when HTML markup is created |
…lect Conflicts: src/canvas.js
@pdfjsbot test |
Processing command test by user arturadib. Queue size: 0 Live script output is available (after queueing is done) at: http://184.73.87.52:8989/2946257.txt [bot:processed:2946257] |
ERROR(s) found Output:
Bot response time: 26.31 mins |
@pdfjsbot test |
Processing command test by user arturadib. Queue size: 0 Live script output is available (after queueing is done) at: http://184.73.87.52:8989/2948602.txt [bot:processed:2948602] |
All tests passed, but with WARNING(s). Make sure to read them! :). Output:
Bot response time: 26.09 mins |
|
||
// TODO actual characters can be extracted from the glyph.unicode | ||
text.str += char === ' ' ? ' ' : char; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is glyph.unicode
not working?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var unicodeText = glyph.unicode;
text.str += unicodeText === ' ' ? ' ' : unicodeText;
Will make selection work for http://unicode.org/charts/PDF/U0C80.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, forgot about that, thanks!
…lect Conflicts: src/canvas.js
Processing command test by user arturadib. Queue size: 0 Live script output is available (after queueing is done) at: http://184.73.87.52:8989/2976621.txt [bot:processed:2976621] |
All tests passed, but with WARNING(s). Make sure to read them! :). Output:
Bot response time: 26.62 mins |
@@ -269,7 +275,37 @@ var CanvasGraphics = (function canvasGraphics() { | |||
}, | |||
|
|||
endDrawing: function canvasGraphicsEndDrawing() { | |||
var self = this; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Declare it next to where it is used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@pdfjsbot test |
Processing command test by user arturadib. Queue size: 0 Live script output is available (after queueing is done) at: http://184.73.87.52:8989/2979094.txt [bot:processed:2979094] |
All tests passed, but with WARNING(s). Make sure to read them! :). Output:
Bot response time: 26.74 mins |
Presently works OK (not perfect) with TraceMonkey, ECMA262, and a few others.
Open
viewer.html
as usual to test it. Note that there's now a<div class="textLayer">
on top of our<canvas>
.To see what's happening "under the hood", change from
color:transparent
tocolor:blue
inviewer.css: .textLayer > div
. (Warning: it's not pretty, it's hacky :)).It's hacky because we can't simply add one (absolutely-positioned)
<div>
or<span>
per word to the DOM as that would be prohibitively slow and memory-consuming. So we need some heuristics to render onediv
pershowSpacedText
line, and to make that DOM width match the canvas text width (I've borrowed some ideas from Crocodoc).Another hack is that because of this width matching, we need to call
.offsetWidth
for every<div>
we're inserting - which is slow and locks the UI. So I'm rendering the text layer lazily using a combination ofsetTimeout
andclearTimeout
, so that we only render things when we have stopped loading pages for a while (currently 500 ms).Known issues:
Apparently we're selecting glyphs and not characters, so sometimes things likefi
are selected as a single character (try selecting e.g. the word "difficult" in the first line oftracemonkey.pdf
)Some characters display OK but are copied messed up (pdkids.pdf
)Some PDFs are not working at all (i9.pdf
)Help and feedback always welcome :)
UPDATE:
mozCurrentTransform
)f1040.pdf
PDFsi9
andpdkids
come out correct in the DOM text, but when copied to clipboard come out messed up. I suspect there's a problem with our font encoding. Perhaps after Changing how encoding and width detection is done #717 lands this will be fixed?UPDATE 2 (after Yury's encoding patch)
i9.pdf
f1040
has lost/messed letter spacing information (presumably from font). Selection still works for full sentences (as the algorithm aligns beginning-ending of sentences at all costs), but it's awkward for words.UPDATE 3 (after Yury's Unicode CMap patch)
pdkids
is fixed@vingtetun @brendandahl @notmasteryet @jviereck