Allow text to be selectable/findable #10

westonruter · 2011-06-15T20:20:57Z

I'm sure this feature has been considered, but this library would be a magnitude cooler if the text in the PDF were interactive, that is, can be selected or traverses by the browser's find functionality.

I'm sure there are many reasons why the text should be embedded directly into the canvas (e.g. for layering), but could transparent text be layered on top of the canvas to allow it to still be selected? This text can be absolutely positioned and have a color of rgba(0,0,0,0.0). See demo: http://jsfiddle.net/westonruter/UGZWE/

The text was updated successfully, but these errors were encountered:

vingtetun · 2011-06-16T03:10:12Z

Definitively this is something we want to do, Chris Jones speak of some direction about that at the end of his blog post http://blog.mozilla.com/cjones/2011/06/15/overview-of-pdf-js-guts/

For the moment we're still learning things about PDF and looking for what's missing on the browser side and what existing technologies (such as SVG) can do about it. Nothing has been decided how the right way to implement the selection feature and we are are open to suggestions, even more opened to patches! :)

Also words inside a PDF are chunks or letters, in order to implement a search/selection feature one needs to figure out an algorithm to rebuild the strings and determine which chunks lives together.

On my side I'm busy working on fonts extraction of the document in order to render Type1 Fonts via @font-face (not natively supported by the browser) and doing rewrite on the fly of badly formed TrueType embed inside the pdf documents (in order to pass the fonts sanitizer of the browser...), bug I would be more than happy to provide directions to implement something or to discuss a solution.

joneschrisg · 2011-06-16T10:21:39Z

Basically, we have two options.
(1) Convert PDF to SVG, let browser do text selection/find. This is obviously attractive because the browser does the "hard work".
(2) Do text selection/find from within pdf.js, on canvas or SVG. This has highest upside because we can use heuristics specific to known PDFs to decide what text to select. For example, a vertical line extending most of the length of the page is probably a column separator. The browser can't assume these things.

Since (1) is less work for us, we're targeting that first. We'll have to see whether that works well enough for us to drop (2). There are probably many other ways to approach this problem.

notmasteryet · 2011-09-08T03:29:25Z

Selectable text prototype https://github.com/notmasteryet/pdf.js/tree/text-1 via div and no-color text. Uses mozCurrentTransform, so will work only with Beta, Aurora and NIghtly. Something to play with...

notmasteryet · 2011-09-08T03:39:32Z

(SVG prototype at https://github.com/andreasgal/pdf.js/issues/229#issuecomment-1651322)

arturadib · 2011-09-30T15:00:29Z

Added to Milestone.

Who wants to get self-assigned to this issue?

@wfwalker

arturadib · 2012-02-28T20:24:57Z

Text selection has been implemented. There's another open issue for text search (see #819). Closing, please reopen if we missed something.

FIrst round of instructions generated from our artificial canvas context

…pageNumbers PR 7341 added special handling for `nameddest`s that look like pageNumbers, to prevent issues since we previously *incorrectly* supported specifying a pageNumber directly in the hash; i.e. `mozilla#10` versus the correct `#page=10` format. Since this behaviour wasn't correct, PR 7757 fixed and deprecated the old format, which means that we no longer need to maintain the `nameddest` hack in multiple files.

joneschrisg mentioned this issue Oct 27, 2011

Searching for text is impossible #706

Closed

arturadib mentioned this issue Oct 31, 2011

Text selection #738

Merged

ghost assigned arturadib Nov 18, 2011

jviereck mentioned this issue Dec 12, 2011

Search highlight obscures text #924

Closed

jruderman mentioned this issue Feb 15, 2012

Text search prototype #964

Closed

GPHemsley mentioned this issue Feb 15, 2012

Find in page is restricted to current chapter #1217

Closed

arturadib closed this as completed Feb 28, 2012

bovardtiberi-wf referenced this issue in Workiva/pdf.js Jul 15, 2013

Merge pull request #10 from WebFilings/jon/getPDFnocanvas

402777d

FIrst round of instructions generated from our artificial canvas context

This was referenced Aug 13, 2013

pdf.js errors with tizen framework #3500

Closed

jslint shows errors in pdf.js #3567

Closed

richerm mentioned this issue May 26, 2016

Issue about knockout groups. #3136

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow text to be selectable/findable #10

Allow text to be selectable/findable #10

westonruter commented Jun 15, 2011

vingtetun commented Jun 16, 2011

joneschrisg commented Jun 16, 2011

notmasteryet commented Sep 8, 2011

notmasteryet commented Sep 8, 2011

arturadib commented Sep 30, 2011

arturadib commented Feb 28, 2012

Allow text to be selectable/findable #10

Allow text to be selectable/findable #10

Comments

westonruter commented Jun 15, 2011

vingtetun commented Jun 16, 2011

joneschrisg commented Jun 16, 2011

notmasteryet commented Sep 8, 2011

notmasteryet commented Sep 8, 2011

arturadib commented Sep 30, 2011

arturadib commented Feb 28, 2012