-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement guards for stringToBytes and bytesToString #5600
Conversation
@@ -454,7 +454,7 @@ function bytesToString(bytes) { | |||
} | |||
|
|||
function stringToBytes(str) { | |||
var length = str.length; | |||
var length = (str ? str.length : 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since str === undefined
probably indicates that there is something wrong with the PDF file, should we print a warning in this case?
Otherwise we might make debugging some future issues harder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Missing any indication of where the problem occurred made finding the issue harder. I'll add that.
06fca7c
to
d0a0d7f
Compare
@@ -454,7 +454,10 @@ function bytesToString(bytes) { | |||
} | |||
|
|||
function stringToBytes(str) { | |||
var length = str.length; | |||
if (!str) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An empty string will check this condition. Is this behaviour desired ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say yes. If we check the usage of stringToBytes
using, for example, https://github.com/mozilla/pdf.js/search?utf8=%E2%9C%93&q=stringToBytes, none of those usages should be dealing with the empty string. Even if the empty string is passed, the function will still work, it will just output a warning to the console as an indication that something might be wrong. A regular user won't see that and it allows us to look into it further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it called often? maybe we then should not do a double checking here and the ternary below but return here a new Uint8Array(0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodingFabian You are right. I thought about that before, but I didn't check how often it was called so I went for this solution. For http://www.etsi.org/deliver/etsi_ts/151000_151099/15101001/10.03.00_60/ts_15101001v100300p.pdf, a large PDF document, it is called almost 100 times per page. I will change that, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodingFabian +1, seems a little more efficient.
@timvandermeij thanks for the answer !
d0a0d7f
to
b437334
Compare
(Sorry about the wall of text!) I've been thinking more about this, and I'm no longer sure if My thinking is that if a PDF file is sufficiently broken, or if our code contains an error, we should ideally have failed before attempting to call |
@Snuffleupagus True, but triggering |
Yes, that was kind of my point.
That was true for one particular issue, but in general we have no way of knowing that!
Calling |
b437334
to
3b57439
Compare
@Snuffleupagus I think you're right about that, so I have changed the commit to use |
@@ -454,6 +454,10 @@ function bytesToString(bytes) { | |||
} | |||
|
|||
function stringToBytes(str) { | |||
if (!str) { | |||
error('Invalid or empty string passed to stringToBytes'); | |||
return new Uint8Array(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using error
, this line is no longer necessary since it won't be reached.
Also, I think this patch could be a one-liner if you instead just did:
assert(str, 'Invalid or empty string passed to stringToBytes');
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, assets
wraps that for us. Really nice, it's a one-liner now :)
3b57439
to
d7f62b6
Compare
@@ -454,6 +454,7 @@ function bytesToString(bytes) { | |||
} | |||
|
|||
function stringToBytes(str) { | |||
assert(str, 'Invalid or empty string passed to stringToBytes'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a stupid suggestion on my part, since an empty string should obviously be considered valid as well (this would explain the test failures, since str === ''
at one point). Sorry about that!
Using typeof str === 'string'
as condition would probably work, but given how often stringToBytes
seems to be used, that might slow things down. I'm not sure what the best solution is, unfortunately :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
if (str === null || str === undefined) { error('Invalid argument passed to stringToBytes'); }
which should be equal to:
assert(str === null || str === undefined, 'Invalid argument passed to stringToBytes');
That way we only check for undefined
and null
and leave the rest (including the empty string) through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typeof str
is nice, since it actually checks that it's a string. But if the result of micro benchmarks is to be trusted, the performance is unfortunately a bit varied in different browsers: http://jsperf.com/typeof-vs-undefined-null. Just do what you want here :-)
0aed2b3
to
f27cd69
Compare
@@ -454,6 +454,8 @@ function bytesToString(bytes) { | |||
} | |||
|
|||
function stringToBytes(str) { | |||
assert(str !== null && str !== undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it: assert(typeof str === 'string', ...);
Also bytesToString looks non-symmetrical now :) can we add asset(bytes !== null && typeof bytes === 'object' && bytes.length !== undefined, ...);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the most recent commit.
f27cd69
to
6e99c29
Compare
/botio-linux preview |
From: Bot.io (Linux)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/626cf9f48d3b201/output.txt |
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/626cf9f48d3b201/output.txt Total script time: 0.78 mins Published
|
/botio test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://107.22.172.223:8877/9c132cad5578bc5/output.txt |
From: Bot.io (Linux)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/7c2d07b52074ddb/output.txt |
From: Bot.io (Windows)SuccessFull output at http://107.22.172.223:8877/9c132cad5578bc5/output.txt Total script time: 17.44 mins
|
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/7c2d07b52074ddb/output.txt Total script time: 23.00 mins
|
Implement guards for stringToBytes and bytesToString
Thanks |
Implement guards for stringToBytes and bytesToString
This makes it more clear where the problem could be for broken PDFs (as was the case for #5599).