Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract the actual decoding in CCITTFaxStream into a new CCITTFaxDecoder "class", which the new CCITTFaxStream depends on #9046

Merged
merged 2 commits into from
Oct 24, 2017

Conversation

Snuffleupagus
Copy link
Collaborator

@Snuffleupagus Snuffleupagus commented Oct 19, 2017

Please note: This PR attempts to implement #8991 (comment), in an effort to help with (quickly) unblocking the actual JBig2 changes proposed in PR #8991.

@yurydelendik Is this approximately what you had in mind?

@mozilla mozilla deleted a comment from pdfjsbot Oct 19, 2017
@mozilla mozilla deleted a comment from pdfjsbot Oct 19, 2017
@mozilla mozilla deleted a comment from pdfjsbot Oct 19, 2017
@mozilla mozilla deleted a comment from pdfjsbot Oct 19, 2017
Copy link
Contributor

@yurydelendik yurydelendik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we need to preserve a streaming nature of the streams and not consume all data as once, when it's possible.

EndOfBlock: this.params.get('EndOfBlock'),
BlackIs1: this.params.get('BlackIs1'),
});
let data = ccittFaxDecoder.parse(this.bytes, this.maybeLength);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this to pass a "source" of data, and this interface will have a single next() methods (which will return -1 for EOF).

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this to pass a "source" of data,

Hopefully I didn't completely misunderstand what you're asking for here :-)

I've change the CCITTFaxDecoder constructor such that you can either initialize CCITTFaxDecoder with a stream (used from CCITTFaxStream), or pass in a Uint8Array instead (which seems like what you'd want in PR #8991 when decoding Huffman data).

and this interface will have a single next() methods

Used the readNextChar name, as suggested in #9046 (comment).

(which will return -1 for EOF).

Good point, fixed now!

}

CCITTFaxDecoder.prototype = {
parse(data, maybeMinBufferLength = 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this method and expose e.g. readNextChar() (former lookChar)

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I've changed the name as suggested in the new version of the PR.

Edit: And thank you for taking the time to provide good feedback on these changes!

configurable: true,
});

CCITTFaxStream.prototype.ensureBuffer = function() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong usage/override of ensureBuffer -- ensureBuffer make sure that this.buffer has enough memory allocated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; this was removed in the new version.

};

CCITTFaxStream.prototype.readBlock = function() {
this.ensureBuffer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this method as in previous version for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted (most of) this, with the exception of the lookChar -> readNextChar rename and the handling of -1 return values.

configurable: true,
});

Jbig2Stream.prototype.ensureBuffer = function() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this was a hack: can we add req parameter and probably a comment that we just ignoring interface and moving all parsed data into the buffer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a pre-existing issue, which is shared between JpegStream, JpxStream, and Jbig2Stream, could we perhaps do that in a follow-up instead?

The interface of all of those streams look kind of weird, and I'm actually a bit surprised that there hasn't been any errors because of it. For example: None of them actually implement readBlock methods, and it seems more luck that anything else that we're not calling getBytes() (without providing a length) for those streams, since that would trigger a code-path in getBytes that assumes readBlock to exist.

One simple solution might to just replace ensureBuffer with readBlock for the JpegStream, JpxStream, and Jbig2Stream streams, which seems more correct to me!?
Anyway, since this is an old issue, I'm hoping this point can be deferred to another PR.

@mozilla mozilla deleted a comment from pdfjsbot Oct 24, 2017
@mozilla mozilla deleted a comment from pdfjsbot Oct 24, 2017
@mozilla mozilla deleted a comment from pdfjsbot Oct 24, 2017
@mozilla mozilla deleted a comment from pdfjsbot Oct 24, 2017
Copy link
Contributor

@yurydelendik yurydelendik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I had in mind. It's good to go after comments addressed.

function CCITTFaxDecoder(source, options = {}) {
if (isStream(source)) {
this.source = source;
} else if (source instanceof Uint8Array) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this branch of code for now (and isStream check) and keep only this.source = source; in this patch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, fixed now.

CCITTFaxStream.prototype.readBlock = function() {
while (!this.eof) {
let c = this.ccittFaxDecoder.readNextChar();
if (c === -1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "if" is not present in the original code. Do you know what is changed?

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the old code was actually wrong here, since lookChar used to return null when EOF was reached. Hence we would actually attempt to insert null into this.buffer here, since the !this.eof check didn't occur until the next loop iteration.

In the new code, we first of all need to set this.eof = true; in the CCITTFaxStream, since various stream methods expect that property to exist (and CCITTFaxDecoder.eof ought to be left alone here).
Secondly, with the refactoring readNextChar will return -1 in the EOF case, and we really don't want insert that into this.buffer. (Considering that setting uint8array[i] = -1 will lead to uint8array[i] === 255.)

if (!isDict(params)) {
params = Dict.empty;
}
this.ccittFaxDecoder = new CCITTFaxDecoder(str, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we could wrap str into source above, e.g. const source = { next() { return str.getByte(); } };, but this will work fine for now. Next CCITTFaxDecoder user need to follow the same interface of { getByte() { ... }}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we could wrap str into source above, e.g. const source = { next() { return str.getByte(); } };, but this will work fine for now.

That seems like a nice idea, so I figured that it cannot hurt to implement that while we're at it.

…ecoder` "class", which the new `CCITTFaxStream` depends on
@janpe2
Copy link
Contributor

janpe2 commented Oct 24, 2017

I had a reason why I "faked" a Stream object in my PR #8991: I need to know the byte offset at which the CCITTFax decoding ends. When decoding MMR-encoded HalftoneRegions, there are many MMR images in a stream and they are separated only by EOFB codes. The JBIG2 data doesn't specify the start and end offset of each MMR halftone image. When I use the same Stream object to decode all the MMR bitmaps of a HalftoneRegion, the Stream remembers the end offset of each MMR image, so the decoding of the next one can continue right after it.

Would it be possible to retain this behavior in the new CCITTFaxDecoder? Or do you suggest the JBIG2 decoder should pass a Uint8Array to the CCITTFaxDecoder constructor. This makes it difficult to return the offset at which the decoding ended.

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Oct 24, 2017

@janpe2 Regarding #9046 (comment):

The latest version of the PR now implements #9046 (comment), see also https://github.com/mozilla/pdf.js/pull/9046/files#diff-15fc1c0e1016b6c65b03effdef7cc5d9R17.
Does this help, in any way, with addressing the use-case you describe (since it should allow you to track the position in the source object provided when initializing CCITTFaxDecoder)?

@janpe2
Copy link
Contributor

janpe2 commented Oct 24, 2017

Yes, looks good. Thank you.

@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/2966c0e60a17d92/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/dfb4b153ada88b1/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/dfb4b153ada88b1/output.txt

Total script time: 16.95 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/2966c0e60a17d92/output.txt

Total script time: 22.88 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@Snuffleupagus Snuffleupagus merged commit ad74f6e into mozilla:master Oct 24, 2017
@Snuffleupagus Snuffleupagus deleted the ccitt-jbig2-stream-refactor branch October 24, 2017 16:14
movsb pushed a commit to movsb/pdf.js that referenced this pull request Jul 14, 2018
…-refactor

Extract the actual decoding in `CCITTFaxStream` into a new `CCITTFaxDecoder` "class", which the new `CCITTFaxStream` depends on
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants