Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement guards for stringToBytes and bytesToString #5600

Merged
merged 1 commit into from
Jan 6, 2015

Conversation

timvandermeij
Copy link
Contributor

This makes it more clear where the problem could be for broken PDFs (as was the case for #5599).

@@ -454,7 +454,7 @@ function bytesToString(bytes) {
}

function stringToBytes(str) {
var length = str.length;
var length = (str ? str.length : 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since str === undefined probably indicates that there is something wrong with the PDF file, should we print a warning in this case?
Otherwise we might make debugging some future issues harder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Missing any indication of where the problem occurred made finding the issue harder. I'll add that.

@@ -454,7 +454,10 @@ function bytesToString(bytes) {
}

function stringToBytes(str) {
var length = str.length;
if (!str) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty string will check this condition. Is this behaviour desired ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say yes. If we check the usage of stringToBytes using, for example, https://github.com/mozilla/pdf.js/search?utf8=%E2%9C%93&q=stringToBytes, none of those usages should be dealing with the empty string. Even if the empty string is passed, the function will still work, it will just output a warning to the console as an indication that something might be wrong. A regular user won't see that and it allows us to look into it further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it called often? maybe we then should not do a double checking here and the ternary below but return here a new Uint8Array(0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodingFabian You are right. I thought about that before, but I didn't check how often it was called so I went for this solution. For http://www.etsi.org/deliver/etsi_ts/151000_151099/15101001/10.03.00_60/ts_15101001v100300p.pdf, a large PDF document, it is called almost 100 times per page. I will change that, thanks!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodingFabian +1, seems a little more efficient.
@timvandermeij thanks for the answer !

@Snuffleupagus
Copy link
Collaborator

(Sorry about the wall of text!)

I've been thinking more about this, and I'm no longer sure if warn is such a great idea here. Perhaps we should use error, or simply assert(str, ...), instead.

My thinking is that if a PDF file is sufficiently broken, or if our code contains an error, we should ideally have failed before attempting to call stringToBytes(undefined).
From a user perspective a warning could be problematic, since there is every chance that a part of the file won't render properly without the user noticing. (Since I'm assuming that most users aren't looking at the console messages).
By throwing an error here we would fail early, and in a very simple way. If we just issue a warning and then try to continue parsing/rendering the file, we're basically moving the error to another part of the code. This being done under the (implicit) assumption that some other part of the code will catch any subsequent errors.

@timvandermeij
Copy link
Contributor Author

@Snuffleupagus True, but triggering error would cause the fallback bar to appear, right? Is that always what we want here? If stringToBytes is called with undefined or null, it might not be a breaking error as we have seen for #5599. If we catch undefined there, the document renders perfectly, so I'm a bit worried that we might be bothering the user with error messages even if the document renders perfectly fine. I thought warn is good for us because we do not bother the user with any messages and if there is a problem and it is reported to us, then when we can see from the console that stringToBytes is at fault.

@Snuffleupagus
Copy link
Collaborator

but triggering error would cause the fallback bar to appear, right?

Yes, that was kind of my point.

it might not be a breaking error as we have seen for #5599.

That was true for one particular issue, but in general we have no way of knowing that!
As I said, I'm not so sure that it's a great idea to just "move" an error to another part of the code base.

so I'm a bit worried that we might be bothering the user with error messages even if the document renders perfectly fine.

Calling stringToBytes(undefined) should be sufficiently uncommon that this isn't really an issue, in my opinion.

@timvandermeij
Copy link
Contributor Author

@Snuffleupagus I think you're right about that, so I have changed the commit to use error instead of warn. Is it good this way?

@@ -454,6 +454,10 @@ function bytesToString(bytes) {
}

function stringToBytes(str) {
if (!str) {
error('Invalid or empty string passed to stringToBytes');
return new Uint8Array(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using error, this line is no longer necessary since it won't be reached.

Also, I think this patch could be a one-liner if you instead just did:
assert(str, 'Invalid or empty string passed to stringToBytes');

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, assets wraps that for us. Really nice, it's a one-liner now :)

@@ -454,6 +454,7 @@ function bytesToString(bytes) {
}

function stringToBytes(str) {
assert(str, 'Invalid or empty string passed to stringToBytes');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a stupid suggestion on my part, since an empty string should obviously be considered valid as well (this would explain the test failures, since str === '' at one point). Sorry about that!

Using typeof str === 'string' as condition would probably work, but given how often stringToBytes seems to be used, that might slow things down. I'm not sure what the best solution is, unfortunately :-(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

if (str === null || str === undefined) {
  error('Invalid argument passed to stringToBytes');
}

which should be equal to:

assert(str === null || str === undefined, 'Invalid argument passed to stringToBytes');

That way we only check for undefined and null and leave the rest (including the empty string) through.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typeof str is nice, since it actually checks that it's a string. But if the result of micro benchmarks is to be trusted, the performance is unfortunately a bit varied in different browsers: http://jsperf.com/typeof-vs-undefined-null. Just do what you want here :-)

@timvandermeij timvandermeij force-pushed the str-undefined branch 2 times, most recently from 0aed2b3 to f27cd69 Compare January 1, 2015 18:05
@@ -454,6 +454,8 @@ function bytesToString(bytes) {
}

function stringToBytes(str) {
assert(str !== null && str !== undefined,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it: assert(typeof str === 'string', ...);

Also bytesToString looks non-symmetrical now :) can we add asset(bytes !== null && typeof bytes === 'object' && bytes.length !== undefined, ...);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the most recent commit.

@timvandermeij timvandermeij changed the title Do not call str.length if str is undefined Implement guards for stringToBytes and bytesToString Jan 2, 2015
@timvandermeij
Copy link
Contributor Author

/botio-linux preview

@pdfjsbot
Copy link

pdfjsbot commented Jan 2, 2015

From: Bot.io (Linux)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/626cf9f48d3b201/output.txt

@timvandermeij
Copy link
Contributor Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Jan 2, 2015

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/9c132cad5578bc5/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Jan 2, 2015

From: Bot.io (Linux)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/7c2d07b52074ddb/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Jan 2, 2015

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/9c132cad5578bc5/output.txt

Total script time: 17.44 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Jan 2, 2015

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/7c2d07b52074ddb/output.txt

Total script time: 23.00 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

yurydelendik added a commit that referenced this pull request Jan 6, 2015
Implement guards for stringToBytes and bytesToString
@yurydelendik yurydelendik merged commit 34bed63 into mozilla:master Jan 6, 2015
@yurydelendik
Copy link
Contributor

Thanks

@timvandermeij timvandermeij deleted the str-undefined branch January 6, 2015 22:29
speedplane pushed a commit to speedplane/pdf.js that referenced this pull request Feb 24, 2015
Implement guards for stringToBytes and bytesToString
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants