Extract the actual decoding in `CCITTFaxStream` into a new `CCITTFaxDecoder` "class", which the new `CCITTFaxStream` depends on #9046

Snuffleupagus · 2017-10-19T14:44:25Z

Please note: This PR attempts to implement #8991 (comment), in an effort to help with (quickly) unblocking the actual JBig2 changes proposed in PR #8991.

@yurydelendik Is this approximately what you had in mind?

yurydelendik

I'm thinking we need to preserve a streaming nature of the streams and not consume all data as once, when it's possible.

yurydelendik · 2017-10-23T15:39:41Z

src/core/ccitt_stream.js

+      EndOfBlock: this.params.get('EndOfBlock'),
+      BlackIs1: this.params.get('BlackIs1'),
+    });
+    let data = ccittFaxDecoder.parse(this.bytes, this.maybeLength);


Let's change this to pass a "source" of data, and this interface will have a single next() methods (which will return -1 for EOF).

Let's change this to pass a "source" of data,

Hopefully I didn't completely misunderstand what you're asking for here :-)

I've change the CCITTFaxDecoder constructor such that you can either initialize CCITTFaxDecoder with a stream (used from CCITTFaxStream), or pass in a Uint8Array instead (which seems like what you'd want in PR #8991 when decoding Huffman data).

and this interface will have a single next() methods

Used the readNextChar name, as suggested in #9046 (comment).

(which will return -1 for EOF).

Good point, fixed now!

yurydelendik · 2017-10-23T15:41:36Z

src/core/ccitt.js

+  }
+
+  CCITTFaxDecoder.prototype = {
+    parse(data, maybeMinBufferLength = 0) {


Let's remove this method and expose e.g. readNextChar() (former lookChar)

Sure, I've changed the name as suggested in the new version of the PR.

Edit: And thank you for taking the time to provide good feedback on these changes!

yurydelendik · 2017-10-23T15:43:13Z

src/core/ccitt_stream.js

+    configurable: true,
+  });
+
+  CCITTFaxStream.prototype.ensureBuffer = function() {


This looks wrong usage/override of ensureBuffer -- ensureBuffer make sure that this.buffer has enough memory allocated.

Agreed; this was removed in the new version.

yurydelendik · 2017-10-23T15:44:29Z

src/core/ccitt_stream.js

+  };
+
+  CCITTFaxStream.prototype.readBlock = function() {
+    this.ensureBuffer();


Let's make this method as in previous version for now.

I've reverted (most of) this, with the exception of the lookChar -> readNextChar rename and the handling of -1 return values.

yurydelendik · 2017-10-23T15:47:34Z

src/core/jbig2_stream.js

+    configurable: true,
+  });
+
+  Jbig2Stream.prototype.ensureBuffer = function() {


Looks like this was a hack: can we add req parameter and probably a comment that we just ignoring interface and moving all parsed data into the buffer.

Since this is a pre-existing issue, which is shared between JpegStream, JpxStream, and Jbig2Stream, could we perhaps do that in a follow-up instead?

The interface of all of those streams look kind of weird, and I'm actually a bit surprised that there hasn't been any errors because of it. For example: None of them actually implement readBlock methods, and it seems more luck that anything else that we're not calling getBytes() (without providing a length) for those streams, since that would trigger a code-path in getBytes that assumes readBlock to exist.

One simple solution might to just replace ensureBuffer with readBlock for the JpegStream, JpxStream, and Jbig2Stream streams, which seems more correct to me!?
Anyway, since this is an old issue, I'm hoping this point can be deferred to another PR.

…o separate files

yurydelendik

Yes, that's what I had in mind. It's good to go after comments addressed.

yurydelendik · 2017-10-24T13:25:28Z

src/core/ccitt.js

+  function CCITTFaxDecoder(source, options = {}) {
+    if (isStream(source)) {
+      this.source = source;
+    } else if (source instanceof Uint8Array) {


We can remove this branch of code for now (and isStream check) and keep only this.source = source; in this patch.

Sure, fixed now.

yurydelendik · 2017-10-24T13:26:45Z

src/core/ccitt_stream.js

+  CCITTFaxStream.prototype.readBlock = function() {
+    while (!this.eof) {
+      let c = this.ccittFaxDecoder.readNextChar();
+      if (c === -1) {


This "if" is not present in the original code. Do you know what is changed?

I think that the old code was actually wrong here, since lookChar used to return null when EOF was reached. Hence we would actually attempt to insert null into this.buffer here, since the !this.eof check didn't occur until the next loop iteration.

In the new code, we first of all need to set this.eof = true; in the CCITTFaxStream, since various stream methods expect that property to exist (and CCITTFaxDecoder.eof ought to be left alone here).
Secondly, with the refactoring readNextChar will return -1 in the EOF case, and we really don't want insert that into this.buffer. (Considering that setting uint8array[i] = -1 will lead to uint8array[i] === 255.)

yurydelendik · 2017-10-24T13:29:21Z

src/core/ccitt_stream.js

+    if (!isDict(params)) {
+      params = Dict.empty;
+    }
+    this.ccittFaxDecoder = new CCITTFaxDecoder(str, {


I was thinking we could wrap str into source above, e.g. const source = { next() { return str.getByte(); } };, but this will work fine for now. Next CCITTFaxDecoder user need to follow the same interface of { getByte() { ... }}

I was thinking we could wrap str into source above, e.g. const source = { next() { return str.getByte(); } };, but this will work fine for now.

That seems like a nice idea, so I figured that it cannot hurt to implement that while we're at it.

…ecoder` "class", which the new `CCITTFaxStream` depends on

janpe2 · 2017-10-24T14:07:12Z

I had a reason why I "faked" a Stream object in my PR #8991: I need to know the byte offset at which the CCITTFax decoding ends. When decoding MMR-encoded HalftoneRegions, there are many MMR images in a stream and they are separated only by EOFB codes. The JBIG2 data doesn't specify the start and end offset of each MMR halftone image. When I use the same Stream object to decode all the MMR bitmaps of a HalftoneRegion, the Stream remembers the end offset of each MMR image, so the decoding of the next one can continue right after it.

Would it be possible to retain this behavior in the new CCITTFaxDecoder? Or do you suggest the JBIG2 decoder should pass a Uint8Array to the CCITTFaxDecoder constructor. This makes it difficult to return the offset at which the decoding ended.

Snuffleupagus · 2017-10-24T14:26:16Z

@janpe2 Regarding #9046 (comment):

The latest version of the PR now implements #9046 (comment), see also https://github.com/mozilla/pdf.js/pull/9046/files#diff-15fc1c0e1016b6c65b03effdef7cc5d9R17.
Does this help, in any way, with addressing the use-case you describe (since it should allow you to track the position in the source object provided when initializing CCITTFaxDecoder)?

janpe2 · 2017-10-24T14:36:02Z

Yes, looks good. Thank you.

Snuffleupagus · 2017-10-24T15:26:18Z

/botio test

pdfjsbot · 2017-10-24T15:26:19Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/2966c0e60a17d92/output.txt

pdfjsbot · 2017-10-24T15:26:20Z

From: Bot.io (Linux m4)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/dfb4b153ada88b1/output.txt

pdfjsbot · 2017-10-24T15:43:18Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/dfb4b153ada88b1/output.txt

Total script time: 16.95 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

pdfjsbot · 2017-10-24T15:49:12Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/2966c0e60a17d92/output.txt

Total script time: 22.88 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

…-refactor Extract the actual decoding in `CCITTFaxStream` into a new `CCITTFaxDecoder` "class", which the new `CCITTFaxStream` depends on

Snuffleupagus added core 4-work-in-progress and removed 4-work-in-progress labels Oct 19, 2017

mozilla deleted a comment from pdfjsbot Oct 19, 2017

Snuffleupagus requested a review from yurydelendik October 23, 2017 15:18

yurydelendik reviewed Oct 23, 2017

View reviewed changes

mozilla deleted a comment from pdfjsbot Oct 24, 2017

Move CCITTFaxStream and Jbig2Stream, from src/core/stream.js, t…

bb35095

…o separate files

yurydelendik approved these changes Oct 24, 2017

View reviewed changes

Extract the actual decoding in CCITTFaxStream into a new `CCITTFaxD…

e94a0fd

…ecoder` "class", which the new `CCITTFaxStream` depends on

Snuffleupagus merged commit ad74f6e into mozilla:master Oct 24, 2017

Snuffleupagus deleted the ccitt-jbig2-stream-refactor branch October 24, 2017 16:14

Snuffleupagus mentioned this pull request Oct 26, 2017

Fix the interface of JpegStream/JpxStream/Jbig2Stream to agree with the other DecodeStreams #9073

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract the actual decoding in `CCITTFaxStream` into a new `CCITTFaxDecoder` "class", which the new `CCITTFaxStream` depends on #9046

Extract the actual decoding in `CCITTFaxStream` into a new `CCITTFaxDecoder` "class", which the new `CCITTFaxStream` depends on #9046

Snuffleupagus commented Oct 19, 2017 •

edited

Loading

yurydelendik left a comment

yurydelendik Oct 23, 2017

Snuffleupagus Oct 24, 2017 •

edited

Loading

yurydelendik Oct 23, 2017

Snuffleupagus Oct 24, 2017 •

edited

Loading

yurydelendik Oct 23, 2017

Snuffleupagus Oct 24, 2017

yurydelendik Oct 23, 2017

Snuffleupagus Oct 24, 2017

yurydelendik Oct 23, 2017

Snuffleupagus Oct 24, 2017

yurydelendik left a comment •

edited

Loading

yurydelendik Oct 24, 2017

Snuffleupagus Oct 24, 2017

yurydelendik Oct 24, 2017

Snuffleupagus Oct 24, 2017 •

edited

Loading

yurydelendik Oct 24, 2017

Snuffleupagus Oct 24, 2017

janpe2 commented Oct 24, 2017

Snuffleupagus commented Oct 24, 2017 •

edited

Loading

janpe2 commented Oct 24, 2017

Snuffleupagus commented Oct 24, 2017

pdfjsbot commented Oct 24, 2017

pdfjsbot commented Oct 24, 2017

pdfjsbot commented Oct 24, 2017

pdfjsbot commented Oct 24, 2017

Extract the actual decoding in CCITTFaxStream into a new CCITTFaxDecoder "class", which the new CCITTFaxStream depends on #9046

Extract the actual decoding in CCITTFaxStream into a new CCITTFaxDecoder "class", which the new CCITTFaxStream depends on #9046

Conversation

Snuffleupagus commented Oct 19, 2017 • edited Loading

yurydelendik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Snuffleupagus Oct 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Snuffleupagus Oct 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yurydelendik left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Snuffleupagus Oct 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janpe2 commented Oct 24, 2017

Snuffleupagus commented Oct 24, 2017 • edited Loading

janpe2 commented Oct 24, 2017

Snuffleupagus commented Oct 24, 2017

pdfjsbot commented Oct 24, 2017

From: Bot.io (Windows)

Received

pdfjsbot commented Oct 24, 2017

From: Bot.io (Linux m4)

Received

pdfjsbot commented Oct 24, 2017

From: Bot.io (Linux m4)

Success

pdfjsbot commented Oct 24, 2017

From: Bot.io (Windows)

Success

Extract the actual decoding in `CCITTFaxStream` into a new `CCITTFaxDecoder` "class", which the new `CCITTFaxStream` depends on #9046

Extract the actual decoding in `CCITTFaxStream` into a new `CCITTFaxDecoder` "class", which the new `CCITTFaxStream` depends on #9046

Snuffleupagus commented Oct 19, 2017 •

edited

Loading

Snuffleupagus Oct 24, 2017 •

edited

Loading

Snuffleupagus Oct 24, 2017 •

edited

Loading

yurydelendik left a comment •

edited

Loading

Snuffleupagus Oct 24, 2017 •

edited

Loading

Snuffleupagus commented Oct 24, 2017 •

edited

Loading