-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance criteria & conflicts #108
Comments
That's a great list and on first scan I think the only issue I have is with:
WebAssembly clearly enables a simple AOT story. Realistically, I think engines are going to want to use a bunch of variations, mixing baseline compilers (that run AOT, while fully-optimized compilers execute in the background, swapping in when ready), profile-based recompilation, JITish compilation etc to optimize cold load time. (I expect a lot of experimentation in this space as it would be an area to compete on quality-of-implementation.) Many of the not-fully-AOT strategies will trade time-to-first-frame for occasional janks while running. This may be fine for many applications (e.g., that spend the first N seconds in a main menu), but some applications will need full performance on their first frame (imagine a game that starts with an intense rendered scene in the intro and doesn't want the first 20 seconds to be stuttery). It'd be nice to have some knob/option/flag/queryable-state (open for discussion) that lets an application request or test for "full" performance. In the limit, we could standardize a way for applications to annotate individual functions (based, e.g., on PGO) as being cold or requiring full-performance AOT (well after v.1, of course). So perhaps we could say "It is assumed that engines will provide (always, or under program control) AOT compilation of high-performance code." |
I've been thinking along these lines exactly. Maybe we (post-v1, so to speak, but early) provide a simple intrinsic that lets you opt into up-front AOT compilation of a function, or a module, or some other unit of granularity - i.e. don't force an AOT compile before you even hit your loading screen, but be able to ensure that all your hot functions are compiled before you start playing a movie or a game to avoid jank and dropped frames. I think a small hook like that is probably all that would be needed to tackle those scenarios, and it's likely not a blocker. A similar practice here is that many modern games will front-load compilation of pixel shaders to avoid janking when they're first used. In the old days, games would try to force texture uploads onto the GPU to avoid jank from lazy on-demand uploads. |
Yeah, definitely sounds like we're thinking along the same lines. I agree we should probably wait to think about this post-v.1 since it'll be better informed by everyones' experience while implementing v.1 and we'll want some time to experiment with what are the minimal set of knobs/hooks/annotations that give all the necessary control to developers. |
From the V8 perspective, the two policies that would be easiest to That said, I think the future is really bright for dynamic optimization of On Wed, Jun 3, 2015 at 9:21 PM, Luke Wagner [email protected]
|
I assume this is one of the major motivating reasons behind wasm? If so, I think it would be useful if this was explained in the FAQ. I think it would also be useful to explain why JavaScript virtual machines consume so much memory when loading large asm.js codebases. For example, if I load the AngryBots-asm.js code into Chrome then memory consumption jumps up by about a gigabyte (and then drops back down again). |
@cosinusoidally this is what we mean in the high-level goals when we mention "load-time-efficient". The FAQ does discuss memory usage as it pertains to the polyfill (it compares usage to regular asm.js). The Chrome issue you mention isn't inherent to asm.js, it's the compiler being silly (known V8 bug in this case). It's a difficult problem to fix, but similar kinds of issues could also occur for WebAssembly. Where WebAssembly wins is in simplicity of the format, and shedding some of JavaScript's oddness. That's hard to quantify without getting into nitpicky details that folks argue over, so we've just avoided playing point-the-finger :-) |
I commonly see V8 using more heap than is technically necessary, but from what I understand it uses the heap however necessary to achieve better performance. If you were to decrease the max old space size it would load using less memory, but take longer. This practice is fairly common for devs who run node on a raspberry pi and don't have the memory. In regards to wasm, wouldn't be surprising if it does use more than expected memory usage. It's not uncommon for node to get issues about potential memory leaks with graphs of large heap usage. When in reality only half of that is in use by objects. |
Wasm will not have the parser problem referenced in the chromium issue On Thu, Jul 9, 2015 at 3:37 AM, Trevor Norris [email protected]
|
I did a bit of experimenting and I managed to find a workaround. It turns out you get the huge memory spike if you define all your functions inside the asm.js module closure. When I moved all the function definitions outside the closure then the spike went away. I did this by moving all the functions outside the closure, splitting the module into chunks, converting all the main closure vars into globals, and loading the whole thing inside an iframe. This works, but unfortunately it regresses performance quite a bit. The hacked together a proof of concept is here: https://github.com/cosinusoidally/angrybots-chunked-js . As mentioned above, it turns out that loading the module in chunks wasn't the main issue (so the repo's name is a bit misleading). Chunked loading could prove to be handy though. With the this proof of concept the memory consumption over time goes from this: to this: Which is quite a significant difference in peak startup memory usage (around a gig, I think). With regards to the pollyfill, are there any plans to use something like a interpreter/baseline compiler (written in JavaScript) in order to reduce startup memory usage for browsers without wasm support? |
Interesting about the memory usage reduction, but you are probably decreasing throughput that way, though. Yes, an interpreter for the polyfill is something worth experimenting with. A baseline compiler is actually what the polyfill is - it is just going to write out the wasm into asm.js in a simple way like a baseline compiler would. Any true optimization would have been done by the compiler emitting wasm, or will be done by the JS VM's JIT. |
The memory usage spike is due to the scope analysis of V8 needing the AST On Fri, Jul 10, 2015 at 5:41 PM, Alon Zakai [email protected]
|
Sorry for getting into the middle of discussion, but I'd like to leave a few points. There is the well known equation, that is: (performance / memory usage) = constant. It basically says that by increasing the memory usage you can increase the theoretical maximum performance. And vice versa: by decreasing memory usage you decrease the theoretical maximum performance. On code level the increasing memory for performance is usually means you use tables of pre-computed data, so the cpu cycles are not used for these computations and thus performance raises. Also, the concept of WebAssembly so far is more similar to JVM/CLR virtual machines than javascript. As you know both JVM and CLR use own assembly language. So some optimization techniques can be derived from their implementations. |
Closing for now, please create new issues for new discussion. |
Good performance (by some set of metrics) is an absolute requirement for us to label a set of decisions as v1, and for us to ship a polyfill and native implementation, as I understand it.
We need to get some clarity on what those metrics are, and try to arrive at an understanding about which metrics matter the most to us. There is a balancing act necessary here where it will not be possible to achieve great results on all these metrics - optimizing really well for wire size can have negative impacts on decode speed or memory usage during decoding or streaming decode performance, etc.
There are also some existing assumptions that we're basing our performance planning on here. Most of these appear to be based on evidence, and we just need to document it. I'll enumerate the ones I've heard:
I'm sure there is more I've overlooked, and I suspect a couple of these are partial understandings on my part. Most of them are things I've heard multiple times, however.
As far as performance criteria go, here are the criteria as I generally understand them:
Once we have a general consensus on all this I'll create a PR to document it.
The text was updated successfully, but these errors were encountered: