-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker implementation #3 #635
Conversation
…oesnt have a document to attach the canvas to
…tform + has no mask
…ttern as toIR is called all the time
@pdfjsbot test |
Processing command test by user jviereck. Queue size: 0 Live script output is available at: http://184.73.87.52:8989/2370679.txt [bot:processed:2370679] |
ERROR(s) found Output:
Bot response time: 30.00 mins |
@cgjones: Once you make the review of this, you might want to take a look at the resulting diffs instead of the commits. I've done some back and forth on the font loading to come to the point where I've reverted the entire changes done to font.js and use the current version upstream basically. Starting from commit be73cb4, you can review the commits one by one as there is no reverting of code from that point on. Sorry once again for how the code is the way it is :( |
// exc = e.toString(); | ||
// continuation(exc); | ||
// throw e; | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this code commented out? Is this temporary?
Julian, I've gone over some of the patch. First my opinion on how to handle the patch, then the nitty gritty. A lot of this seems to be work-in-progress so it was pretty hard to follow the logic. My 'business' mind would prefer to see a major patch like this to at least come with an equally large payoff, but my understanding is that we can't yet enable workers to enjoy the performance gain, as you're still sorting out some issues. So it's a bit of a risk to land a large, partially complete patch in our code base without the actual benefit in sight. (If it only takes a small additional patch to get the whole thing working, then I'd personally prefer to wait until we can actually see the performance gain before landing the patch). So my suggestion would be to either (a) wait for a patch with workers working and less partially-done code, or (b) use your current pull request as reference for a more incremental contribution, such that at every stage the purpose of the change is clear and potentially useful in its own right. (That could become a goal for next quarter, in which case I'd be happy to take on the challenge if you're tired of working on it). Again that's just my opinion, but @cgjones is obviously much better equipped to make a decision here. As to the details, here are some of my questions/points. Again, I've found it pretty hard to follow the patch, and the nomenclature was a bit confusing, but maybe some of this is because things are not quite ready yet?
Anyway, I look forward to hearing from everyone else here. Again, Brendan and I will be doing a major file restructuring off the master branch next week in Mountain View; it'd be kinda painful to bring your changes into the new files if the PR doesn't land before then. It'd be nice to know what's the final word on this. |
No, that wasn't the point. Right now we create the PDF document a few times and there is some refactoring needed such that we need to store the PDF only once.
|
}); | ||
} | ||
setGState: function canvasGraphicsSetGState(states) { | ||
for (var i = 0; i < states.lenght; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'lenght' that worries me that this passed the extgstate pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see now extgstate is a load test, where it should be an eq test. Let me go fix that.
I just wanted to put into writing what we discussed over the phone. In addition to my points above:
|
@jvierek I apologize for the lag again. I've barely been keeping my head above water lately. Architecturally, this approach still looks fine; looks like it hasn't changed from the previous patch. The fixes you've made look good. It looks like the comments about file organization weren't addressed, and I think Artur was confused similarly to me. We need to figure that part out. As for landing, I mostly agree with @arturadib's comments above. We should land this with a functional worker backend. When this architecture runs only on the main thread, I think we'll mostly regress throughput without much responsiveness gain, though there are some pdfs it could improve. @arturadib @brendandahl @notmasteryet how well do you guys understand the architecture these patches are moving towards? One thing we neglected in the previous pull request was getting the core devs on the same page; it's important because this change will permeate a lot of pdf.js, and affect the way patches need to be written. I think the first "action item" is to make sure the new architecture is documented and understood. @jvierek, want to start with a write-up and then post the link to the mailing list? Next, we should figure out how the worker backend will interact with the code reorganization being planned (I haven't been following the details). Maybe you guys could conf call next week to discuss? With both of those settled, I think it'll be clearer how to get the current patch here ready to land, and it may make sense to distribute some work so that everyone gets a feel for the new backend. Sound good?
Yes, that entirely solves the responsiveness problem. It doesn't mean we don't have to deal with dependencies: we still have to wait on fonts and images to load. I'm not sure that a lot of lines of code could be removed from this patch, but it's likely that it could be made conceptually simpler. Next week may be a good time to figure that out.
Everyone (to a first approximation) has at least 2 HW threads, phones being the exception for the moment. I wouldn't be surprised if 4 (dual-core+HT) is more common than 2 nowadays. You're right that scheduling work is hard. That's a bridge we'll have to cross, and we may need new platform support. But we need to be able to scale up to more HW threads: that's where perf wins for this kind of code will come from from now on. That doesn't mean we should neglect other wins like parallel rendering of individual pages, or same-page optimizations like SIMD. |
okay, I'm going to start from scratch again and create a as-simple-as-possible version that for now runs only in one thread. Closing this pull request as there gonna be a new one. |
Hey Chris, thanks so much for making time for this!
That would be extremely helpful. I know that this will be hard no matter what, but since it'll affect the overall architecture of the code it'd be nice to be able to understand the big picture first. Big thanks to Julian for plowing along and making so much progress already. Because of his work we will all be in a better position to help and make a more educated decision.
Right, since we'll be working on this already next week, maybe @jviereck can postpone writing code until after we're done? @jviereck Meanwhile, as per Chris's suggestion, perhaps you can open an issue like #646 explaining the overall architecture and what files will be affected, to gather feedback?
Makes sense. I guess it'd be helpful to quantify this a bit better, so I've filed a bug with the metrics team: https://bugzilla.mozilla.org/show_bug.cgi?id=694565 As to HTs, beyond freeing up the UI with one additional thread, the benefit from using them remains to be seen, as although they mitigate context-switching, there's still execution-switching and message-passing overhead. In other words, launching a total of 4 threads because we have 2 cores + 2 HTs might be slower than launching only 2. I suppose we won't know until we try! Speaking of which... @cgjones Do you know if we can access the # of cores available via XPCOM or some other browser-specific method? I've found some threads regarding the spec of Web Workers but nothing concrete yet: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-November/024058.html |
@jviereck - in response to your IRC comments (you logged out right when I wrote a response! :)):
Man, that's fantastic you're hungry to finish this! And I do like Chris's idea to first have some sort of documentation explaining the overall architecture to everyone. This is very much in line with what Tom Preston-Werner (Github's founder) asks his own team to do before trying to code anything big that affects other folks: It's a great practice that I've adopted for a while too (hence #646) - I highly recommend reading this post. Do you think you can do that before you get too far along in the code? I'd hate to see you work so hard again before we have a chance to chime in on your plans. What do you think? |
Sounds good. Here we go:
If you are short on time, just start reading form the "Birds eye overview" |
We have internal (C++) APIs for this, but definitely nothing exposed to The Web, and nothing through XPCOM AFAIK. We have to be very careful about exposing that kind of information; ideally we want the browser to choose the #workers allocated to pages. I had a WIP "parallelMap()" kind of API for workers, but ran out of time before anything came of it. For pdf.js, that may or may not be the primitive we want. We'll let the impl drive new APIs/specs. |
This is a merged version of the worker branch that hopefully works. After the merge, the fonts weren't working for some pages so I've redone all the font changes, started from a scratch version of fonts.js from master again an build from there. The font's are currently processed (new Font(...)) on the main thread and not in a worker as well, but let's keep it simple for now and get something landed.
Compared to the wish list from Chris on the other pull request, I've addressed some of the points, but some still stay opened:
PDFDoc
got renamed toPDFDocModel
andWorkerPDFDoc
is nowPDFDoc
. That way, we don't have to change any of the APIs as you just can call the new PDFDoc as you did before.Let's see what the bot says to this patches. Can someone please take a look at the last changes if they make sense?
Added by @arturadib - Links to previous pull requests and discussion: #459 #130 #95 #45