-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract the actual decoding in CCITTFaxStream
into a new CCITTFaxDecoder
"class", which the new CCITTFaxStream
depends on
#9046
Extract the actual decoding in CCITTFaxStream
into a new CCITTFaxDecoder
"class", which the new CCITTFaxStream
depends on
#9046
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking we need to preserve a streaming nature of the streams and not consume all data as once, when it's possible.
src/core/ccitt_stream.js
Outdated
EndOfBlock: this.params.get('EndOfBlock'), | ||
BlackIs1: this.params.get('BlackIs1'), | ||
}); | ||
let data = ccittFaxDecoder.parse(this.bytes, this.maybeLength); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change this to pass a "source" of data, and this interface will have a single next() methods (which will return -1 for EOF).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change this to pass a "source" of data,
Hopefully I didn't completely misunderstand what you're asking for here :-)
I've change the CCITTFaxDecoder
constructor such that you can either initialize CCITTFaxDecoder
with a stream (used from CCITTFaxStream
), or pass in a Uint8Array
instead (which seems like what you'd want in PR #8991 when decoding Huffman data).
and this interface will have a single next() methods
Used the readNextChar
name, as suggested in #9046 (comment).
(which will return -1 for EOF).
Good point, fixed now!
src/core/ccitt.js
Outdated
} | ||
|
||
CCITTFaxDecoder.prototype = { | ||
parse(data, maybeMinBufferLength = 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this method and expose e.g. readNextChar() (former lookChar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I've changed the name as suggested in the new version of the PR.
Edit: And thank you for taking the time to provide good feedback on these changes!
src/core/ccitt_stream.js
Outdated
configurable: true, | ||
}); | ||
|
||
CCITTFaxStream.prototype.ensureBuffer = function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong usage/override of ensureBuffer -- ensureBuffer make sure that this.buffer has enough memory allocated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; this was removed in the new version.
src/core/ccitt_stream.js
Outdated
}; | ||
|
||
CCITTFaxStream.prototype.readBlock = function() { | ||
this.ensureBuffer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this method as in previous version for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reverted (most of) this, with the exception of the lookChar
-> readNextChar
rename and the handling of -1
return values.
src/core/jbig2_stream.js
Outdated
configurable: true, | ||
}); | ||
|
||
Jbig2Stream.prototype.ensureBuffer = function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this was a hack: can we add req
parameter and probably a comment that we just ignoring interface and moving all parsed data into the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a pre-existing issue, which is shared between JpegStream
, JpxStream
, and Jbig2Stream
, could we perhaps do that in a follow-up instead?
The interface of all of those streams look kind of weird, and I'm actually a bit surprised that there hasn't been any errors because of it. For example: None of them actually implement readBlock
methods, and it seems more luck that anything else that we're not calling getBytes()
(without providing a length) for those streams, since that would trigger a code-path in getBytes
that assumes readBlock
to exist.
One simple solution might to just replace ensureBuffer
with readBlock
for the JpegStream
, JpxStream
, and Jbig2Stream
streams, which seems more correct to me!?
Anyway, since this is an old issue, I'm hoping this point can be deferred to another PR.
…o separate files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what I had in mind. It's good to go after comments addressed.
src/core/ccitt.js
Outdated
function CCITTFaxDecoder(source, options = {}) { | ||
if (isStream(source)) { | ||
this.source = source; | ||
} else if (source instanceof Uint8Array) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this branch of code for now (and isStream check) and keep only this.source = source; in this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, fixed now.
CCITTFaxStream.prototype.readBlock = function() { | ||
while (!this.eof) { | ||
let c = this.ccittFaxDecoder.readNextChar(); | ||
if (c === -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "if" is not present in the original code. Do you know what is changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the old code was actually wrong here, since lookChar
used to return null
when EOF was reached. Hence we would actually attempt to insert null
into this.buffer
here, since the !this.eof
check didn't occur until the next loop iteration.
In the new code, we first of all need to set this.eof = true;
in the CCITTFaxStream
, since various stream methods expect that property to exist (and CCITTFaxDecoder.eof
ought to be left alone here).
Secondly, with the refactoring readNextChar
will return -1
in the EOF case, and we really don't want insert that into this.buffer
. (Considering that setting uint8array[i] = -1
will lead to uint8array[i] === 255
.)
src/core/ccitt_stream.js
Outdated
if (!isDict(params)) { | ||
params = Dict.empty; | ||
} | ||
this.ccittFaxDecoder = new CCITTFaxDecoder(str, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking we could wrap str
into source
above, e.g. const source = { next() { return str.getByte(); } };
, but this will work fine for now. Next CCITTFaxDecoder user need to follow the same interface of { getByte() { ... }}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking we could wrap str into source above, e.g. const source = { next() { return str.getByte(); } };, but this will work fine for now.
That seems like a nice idea, so I figured that it cannot hurt to implement that while we're at it.
…ecoder` "class", which the new `CCITTFaxStream` depends on
I had a reason why I "faked" a Stream object in my PR #8991: I need to know the byte offset at which the CCITTFax decoding ends. When decoding MMR-encoded HalftoneRegions, there are many MMR images in a stream and they are separated only by EOFB codes. The JBIG2 data doesn't specify the start and end offset of each MMR halftone image. When I use the same Stream object to decode all the MMR bitmaps of a HalftoneRegion, the Stream remembers the end offset of each MMR image, so the decoding of the next one can continue right after it. Would it be possible to retain this behavior in the new |
@janpe2 Regarding #9046 (comment): The latest version of the PR now implements #9046 (comment), see also https://github.com/mozilla/pdf.js/pull/9046/files#diff-15fc1c0e1016b6c65b03effdef7cc5d9R17. |
Yes, looks good. Thank you. |
/botio test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/2966c0e60a17d92/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/dfb4b153ada88b1/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/dfb4b153ada88b1/output.txt Total script time: 16.95 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/2966c0e60a17d92/output.txt Total script time: 22.88 mins
|
…-refactor Extract the actual decoding in `CCITTFaxStream` into a new `CCITTFaxDecoder` "class", which the new `CCITTFaxStream` depends on
Please note: This PR attempts to implement #8991 (comment), in an effort to help with (quickly) unblocking the actual JBig2 changes proposed in PR #8991.
@yurydelendik Is this approximately what you had in mind?