-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support download data asynchronously #166
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I made a few comments about error handling.
<!-- Validate inputs and outputs --> | ||
1. If any of the following requirements are unmet, then [=reject=] |promise| with a {{TypeError}} and stop. | ||
1. If any of the following requirements are unmet, then throw a {{TypeError}} and stop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful to use different exception types to signal different ways compute()
can fail? Do we expect a web developer to write control flow for error handling that could recover if it'd know more about the reason of failure? If that'd be a valid use case, then:
We could either use multiple simple exceptions (these match ECMAScript error objects):
https://heycam.github.io/webidl/#dfn-simple-exception
Or DOMExceptions with error names:
https://heycam.github.io/webidl/#idl-DOMException-error-names
For example, we could throw a new "DataError" DOMException to signal value.data errors.
index.bs
Outdated
1. If |remainingOutputNames| is empty, then resolve |promise| with |results| and stop. | ||
</div> | ||
1. If there is an error returned by |this|.{{MLGraph/[[implementation]]}}, then: | ||
1. Throw an {{OperationError}} and stop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given "OperationError" is a DOMException error name, we should say:
Throw an "OperationError" DOMException and stop.
(For simple exceptions, we can simply say e.g. "Throw a TypeError".)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @anssiko . I'll fix that once the API shape discussion is settled.
index.bs
Outdated
const bufferC = new Float32Array(sizeOfShape([3, 3])); | ||
const outputs = graph.compute(inputs); | ||
await outputs.c.data(bufferC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern implies that for every compute
call, the browser must allocate an intermediate resource for the execution result, since the only pre-allocated output buffer allowed is the buffer used as the download destination. I'm thinking in the case of an implementation on the GPU where the execution result will need to be staged in a GPU buffer that can't be pre-allocated according to this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wchao1115 . I think you are right, the pre-allocated output buffers should be provided at compute call. I revert the MLOutput
change and keep its usage for pre-allocated output buffers. By following the discussion in #162, I add the MLTensor
interface for the newly allocated output buffers that are returned by compute call. Please take another look.
index.bs
Outdated
Promise<MLData> data(); | ||
Promise<undefined> data(MLData data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this complicates the GPU case and makes it less efficient because it essentially requires 2 buffers to get a GPU execution result, one hidden from the caller and another as a destination of an async download call. This may not be unusual if the scenario calls for an execution result be in a CPU memory like object detection, etc., but there are also many other scenarios where the output should be in a GPU buffer like the style transfer or GAN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the semantic front, if the data
call is going to mean more than just a simple data download operation, say, including layout conversion in some implementation, then I will have a hard time understanding how the browser would be able to implement this call behind the scene i.e. how could an implementation of a mere MLOutput
struct know how to layout-convert every possible output of every operation defined which has this MLOutput
struct bound to at execution time? It would also mean that any operation can produce an output operand of any layout format since the real output data can only be obtained from the eventual data
call after the graph is executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this complicates the GPU case and makes it less efficient because it essentially requires 2 buffers to get a GPU execution result, one hidden from the caller and another as a destination of an async download call.
For GPU case, I agree this would add one buffer overhead if user code pre-allocates output buffer. So I revert the change of MLOutput
and keep its usage for pre-allocated output buffers. With that, the user code can bind the pre-allocated GPU buffer via MLOutput
as the execution output without requiring a hidden one. If the user code doesn't pre-allocate GPU buffer, it can get the newly allocated output GPU buffer via MLTensor.data()
.
how could an implementation of a mere
MLOutput
struct know how to layout-convert every possible output of every operation defined which has thisMLOutput
struct bound to at execution time?
An implementation could create an MLTensor
(in my new commit) that wraps the output object of the native execution API and return it to user code. For example, an oneDNN API based MLTensor
implementation could wrap the dnnl_memory
that is used as the destination memory of dnnl_primitive_execute
. A dnnl_memory
may use blocked layout for performance optimization. When user code calls MLTensor.data()
, the implementation could use reorder
primitive to convert the blocked layout of the dnnl_memory
to plain layout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new change seems to introduce an additional MLTensor
into the mix with another compute
overload that has a different semantic. The problem with this change is that the caller will now need to know how webnn is implemented in order to choose the right calling pattern for the compute
method. Generally speaking, the hard part of designing an abstraction API, like webnn, is precisely to make it so that the caller doesn't need to have an intimate knowledge of how the API is implemented in order to call it correctly.
The point I was making earlier about how an implementation of MLOutput
could possibly know how to implement it properly when the layout conversion semantic is baked into its definition is about all of the possible outcomes it must produce, not about what platform API, the browser must call.
For example, an MLOutput
could be bound to a graph that has one of the various activations, a convolution, a pooling, a batchnorm, or an RNN, etc as the graph's output layer. So if we define MLOutput's
behavior so that it knows how to perform layout conversion after any graph is executed, then it effectively must know how any graph produces its output through any operator at the end of the graph. It must also know how to post-process any graph output of any such operator. I think this will be very hard for the implementer to get right on any platform. As a data point, DirectML, for instance, has no concept of layout post-processing post-graph execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this change is that the caller will now need to know how webnn is implemented in order to choose the right calling pattern for the
compute
method.
It's based on pre-allocating output buffers or not. If the caller pre-allocates the output buffers, call compute(inputs, outputs)
. If the caller doesn't, call outputs = compute(inputs)
. In current spec, the compute
also has the optional outputs argument that is for the two usages. The difference is only for the newly allocated outputs returned by compute
.
The point I was making earlier about how an implementation of
MLOutput
could possibly know how to implement it properly when the layout conversion semantic is baked into its definition is about all of the possible outcomes it must produce, not about what platform API, the browser must call.
As I mentioned, the layout conversion should be an implementation choice. From the API surface, the results returned by MLTensor.data()
are always in standard layout defined in the spec. An implementation would not care about the layout conversion, if the native API it uses doesn't have such concept, such as DirectML as you mentioned, actually the webnn-native DirectML backend implements in this way. However, this change would allow another implementation to optimize if the native API exposes the platform optimized layout, such as oneDNN that I mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sync and async is a good discussion point. Supporting both may complicate the implementation in the browser as they carry different guarantees and require different setup with different implications to the caller. I personally still prefer that we stick with just one model for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RafaelCintron has a great suggestion w.r.t. sync/async API in the CG call this morning.
RafaelCintron: if we must have async version, perhaps we can have that be strongly type to only take ArrayBuffer and return them
I think it's a good idea. Not sure how to translate that into an API proposal. @huningxin Do you want to try that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, it is a good idea. Thanks @RafaelCintron .
However, as the origin intent of this PR is for layout conversion and it has touched so many other important topics, I would suggest we handle them in separate issues. What I would like to do is:
- open an issue about native format support for future spec and close this PR.
- open an issue for compute API that only takes pre-allocated output buffers.
- open an issue for sync API for wasm lib usage Support CPU - WebAssembly scenario of the op level execution use case #156 (comment)
WDYT? @wchao1115
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that we close this PR with the 3 issues tracked separately? I'm fine with that too. I can probably tackle the second issue in my next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(We discussed this PR on today's call https://www.w3.org/2021/05/13-webmachinelearning-minutes.html#t11 and agreed to continue and consolidate related discussion in here.) |
Follow up on the issue discussed in #162 for operation-level scenario in frameworks that support asynchronous data downloads. The changes include:
MLTensor
interface that supports download data asynchronously.MLGraph.compute
that returnsMLTensor
synchronously.Fix #156.
@RafaelCintron
Preview | Diff