Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set body with byte reading support #1593

Merged
merged 11 commits into from
Feb 20, 2023
Merged

Set body with byte reading support #1593

merged 11 commits into from
Feb 20, 2023

Conversation

saschanaz
Copy link
Member

@saschanaz saschanaz commented Jan 13, 2023

Fixes #267, paired with w3c/FileAPI#188

The switch is quite simple it seems 👀

(See WHATWG Working Mode: Changes for more details.)


Preview | Diff

@domenic
Copy link
Member

domenic commented Jan 13, 2023

You should probably have something similar to https://wicg.github.io/serial/#readable-attribute , which uses https://streams.spec.whatwg.org/#readablestream-current-byob-request-view to require that the implementation only pull the requested number of bytes from the network, and then write into the current BYOB request view when applicable.

@saschanaz
Copy link
Member Author

Hmm, so currently Fetch has "If stream doesn’t need more data ask the user agent to suspend the ongoing fetch.", while Web Serial has:

  1. Let desiredSize be the desired size to fill up to the high water mark for this.[[readable]].
  2. If this.[[readable]]'s current BYOB request view is non-null, then set desiredSize to this.[[readable]]'s current BYOB request view's byte length.

I guess I should revise the sentence to also check whether the current byob request view is null and suspend the fetch if it is. Am I understanding correct?

And if it's correct, should need more data be revised for byte streams?

@domenic
Copy link
Member

domenic commented Jan 13, 2023

I guess I should revise the sentence to also check whether the current byob request view is null and suspend the fetch if it is. Am I understanding correct?

Not really.

I guess there are two incompatible models in play here, and we have to pick a path.

  1. Rely on buffering somewhere else in the stack. In this model, we'd:

    • Set highWaterMark = 0
    • Assume that as data comes in from the network, it's getting stored in a buffer/queue somewhere else in the stack, e.g. kernel, networking library, etc.
    • Wait for pulls. When we get a pull, either:
      • BYOB request is null, and so we create a new Uint8Array, grab whatever's in the other-part-of-the-stack buffer, put it into that Uint8Array, and enqueue it. (If nothing's there now, wait until something is.)
      • BYOB request is not null, and so we grab byobRequestView.byteLength bytes from the other-part-of-the-stack buffer, and write it into byobRequestView.
    • Write out some vague spec text about how we're assuming that stuff accumulating in the elsewhere-in-the-stack buffer causes backpressure, and thus will automatically suspend/resume bytes accumulating in that buffer, probably in a gradual fashion due to how TCP works.
  2. Rely on the stream's internal queue. In this model, we'd basically do what your current draft of the PR does, or the current spec does. We only use enqueue, ignoring the current BYOB request. We'd use "suspend" and "resume" primitives to interface with the network layer, based on when the stream's internal queue is too full (i.e., when "need more data" is false).

There is no real observable difference between these for JavaScript developers. So maybe (2) is fine, perhaps supplemented with a note. But in terms of how they communicate strategies to implementers, (1) seems better. With (2), if you implemented it verbatim, you'd be doing (unobservable) copies all over the place, from the Uint8Arrays that have accumulated in the stream's internal queue via enqueue, into the BYOB request buffers the web developer supplies.

This is a bit more complicated because from what I understand, (1) is not really implementable directly on top of low-level OS socket APIs. This is because in that framing, the "buffer somewhere elsewhere in the stack" is the kernel buffer, and that has a limit, so you could eventually lose data if backpressure isn't respected fast enough. So really there's a hidden middle layer which wraps the OS socket APIs with an unlimited buffer, I think. But it's been many years since I investigated this and I'm not sure I ever fully understood it, so, I might not have that part correct...

((2) is definitely not implementable directly on top of low-level APIs, because there is no "suspend" or "resume" socket API. In some sense that is bad, but in some sense it's at least honest about how high-level it is.)

Anyway, I think thoughts from implementers on what would be the most helpful would be appreciated. /cc @ricea

@saschanaz
Copy link
Member Author

saschanaz commented Jan 13, 2023

cc @evilpie and @mgaudet (and perhaps @jesup?)

@ricea
Copy link
Collaborator

ricea commented Jan 13, 2023

@domenic (1) suits me, because it matches the way we'll actually implement it.

If you're reading directly from a kernel buffer and we're using TCP then the kernel will ensure that you don't lose data, applying backpressure if you stop reading for a while.

However, in practice we're almost always using TLS, so there's a TLS library also buffering between us and the kernel. It will also be careful not to drop data and will apply backpressure when its finite buffer is full, so no problem there.

If we're using UDP, then since this is Fetch I assume we're using QUIC. QUIC provides in-order reliable streams, so you won't lose data. I'm pretty sure backpressure works similarly to TCP within a stream.

In Chrome we have a bunch of extra abstraction layers, but I assume you're thinking about server software that runs closer to the OS.

@saschanaz
Copy link
Member Author

saschanaz commented Jan 19, 2023

Okay, so I asked around, checked the implementation, and I'm now confident that option 1 matches our behavior. I tried to spec that, except the backpressure thing. I think that one is more of the description for suspend/resume (and thus kinda out of scope of this PR).

Edit: The PR somehow copied "Let buffer an empty buffer that can have ... appended to it" from infra, and used "append" and "extract" against that without the definitions. Infra doesn't really define "append" against buffer either, so maybe it's okay?

fetch.bs Outdated
<li>[=ReadableStream/Enqueue=] |view| into |stream|.

<li><p>If |stream| is [=ReadableStream/errored=], then [=fetch controller/terminate=]
|fetchParams|'s [=fetch params/controller=].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can happen inside pull. I think it needs to be separate, something like, if the network blows up, terminate the controller and error the stream.

I guess below there is already a line that does that, but it doesn't error the stream? Is there some way in which terminating the controller errors the stream, which I haven't seen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure either and honestly just copied this from the previous steps which was also right after the enqueueing step.

Looking at the Gecko call diagram, it seems ReadableByteStreamControllerRespond can lead to ReadableByteStreamControllerEnqueueClonedChunkToQueue which can error the stream on a buffer clone failure. It's complex enough, I'd be happy if the spec can list the possible error reasons.

I think it needs to be separate, something like, if the network blows up, terminate the controller and error the stream.

Yes, but I think that's a lil bit out of scope here as I see no relevant existing step, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if the lack of proper erroring step is a preexisting problem then we don't need to fix it here.

I do think this step makes very little sense, so I'd prefer to remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copypasting my question from Matrix: https://matrix.to/#/!AGetWbsMpFPdSgUrbs:matrix.org/$1y0cHr7B913RupwXNW93uLA06GT5Uls6nV8Ds8khB80?via=matrix.org&via=mozilla.org&via=igalia.com

r = new ReadableStream({
  async pull(c) {
    await new Promise(r => setTimeout(r, 100));
    c.byobRequest.respond(512);
  },
  type: "bytes"
});
reader = r.getReader({mode: "byob"});
reader.read(new Uint16Array(1024));
setTimeout(() => reader.releaseLock(), 5);

This eventually hits https://streams.spec.whatwg.org/#abstract-opdef-readablebytestreamcontrollerenqueueclonedchunktoqueue which theoretically can error the stream. Can Fetch really ignore this?

fetch.bs Outdated
|fetchParams|'s [=fetch params/controller=].

<li><p>If |stream| doesn't [=ReadableStream/need more data=], ask the user agent to
[=fetch/suspend=] the ongoing fetch.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this model, we don't consult "need more data" at all. We should instead have something vague near the definition of buffer about how we expect that the buffer getting too full / too empty will suspend/resume.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was lazy and hoping I could just reuse the existing things... but you're right, since the HWM is now zero the desired size cannot be a positive number. I'll try adding some notes below buffer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I instead moved this step back to the fetching algorithm and made it to sleep if the buffer becomes larger than a user-agent defined limit. I think that works too?

(It's me who just don't want to describe about the network layer which is not exactly in my area)

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this approach to the network-stack buffer, modulo my comment about a nonzero lower limit

fetch.bs Outdated

<li>Return |promise| and run the remaining steps [=in parallel=].

<li>If |buffer| is empty and the ongoing fetch is [=fetch/suspend|suspended=],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be required to be empty before you resume. So you should add a similar user-agent-defined lower limit at which it can resume.

fetch.bs Outdated
<ol>
<li>Let |promise| be [=a new promise=].

<li>Return |promise| and run the remaining steps [=in parallel=].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been avoiding this pattern these days, and instead using proper nesting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular you want "Run these steps in parallel:", then the nested steps, and then "Return promise.".

fetch.bs Outdated

<li>Let |desiredSize| be |available|.

<li>If |stream|'s [=ReadableStream/current BYOB request view=] is non-null, then set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current BYOB request view should not be accessed in parallel; nor should you create Uint8Arrays. You could probably fix this by posting a fetch task back after the waiting is done.

However, I'm not sure this is worth fixing, given the general problems with Streams being very JSey but being used even for no-JS-involved fetches (e.g. those who use a parallel queue). Thoughts from @annevk appreciated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I fixed it to queue a task as it was simple enough, I can revert it if Anne disagrees.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to do the right thing where we can, but I also don't mind if we take shortcuts when it cannot be observed. Although at some point we'll have to clean it all up.

fetch.bs Outdated
<li>[=ReadableStream/Enqueue=] |view| into |stream|.

<li><p>If |stream| is [=ReadableStream/errored=], then [=fetch controller/terminate=]
|fetchParams|'s [=fetch params/controller=].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if the lack of proper erroring step is a preexisting problem then we don't need to fix it here.

I do think this step makes very little sense, so I'd prefer to remove it.

@annevk annevk requested a review from domenic February 6, 2023 12:12
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nit

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll trust @domenic got the logic right. I have a few editorial comments and then it seems this can go in.

Will you file an MDN issue as well if needed?

fetch.bs Outdated

<li><p>If |stream|'s [=ReadableStream/current BYOB request view=] is non-null, then set
|desiredSize| to |stream|'s [=ReadableStream/current BYOB request view=]'s [=BufferSource/byte
length=].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No newlines inside terms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's not what the Great Rewrapper does, but okay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's Dominic's tool that's a known issue I think. HTML has a different convention that cares less about inline search. I want inline search to be as straightforward as possible.

@saschanaz
Copy link
Member Author

The build failure doesn't seem relevant 🤔

@saschanaz
Copy link
Member Author

Filed mdn/content#24453

@annevk annevk added the do not merge yet Pull request must not be merged per rationale in comment label Feb 15, 2023
@annevk
Copy link
Member

annevk commented Feb 15, 2023

This is blocked on speced/bikeshed#2471, but otherwise ready to go.

@saschanaz saschanaz closed this Feb 16, 2023
@saschanaz saschanaz reopened this Feb 16, 2023
@saschanaz saschanaz mentioned this pull request Feb 16, 2023
@saschanaz saschanaz closed this Feb 16, 2023
@saschanaz saschanaz reopened this Feb 16, 2023
@saschanaz saschanaz closed this Feb 17, 2023
@saschanaz saschanaz reopened this Feb 17, 2023
@annevk annevk removed the do not merge yet Pull request must not be merged per rationale in comment label Feb 20, 2023
@annevk annevk merged commit 67d4cde into whatwg:main Feb 20, 2023
@annevk
Copy link
Member

annevk commented Feb 20, 2023

@saschanaz can you file an MDN issue? Also, thanks for making this happen!

@saschanaz saschanaz deleted the byte-stream branch February 20, 2023 09:58
@saschanaz
Copy link
Member Author

Did it already! #1593 (comment)

@annevk
Copy link
Member

annevk commented Feb 20, 2023

Ah! So the reason I asked again is because I go by the checklist in OP as I easily forget. I updated that now to include that reference.

@annevk annevk added topic: streams addition/proposal New features or enhancements labels Feb 20, 2023
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 1, 2023
ReadableByteStream is a variant of ReadableStream specialized for
bytes[1]. Given the performance benefits, this CL adds BYOB support
for Fetch by making Response.body a byte stream to allow for reading
with a bring-your-own-buffer(BYOB) reader.

The corresponding spec PR for this was landed at whatwg/fetch#1593.
Tests for reading from Blob with a BYOB reader were factored out, as
support for that will be implemented in follow-up CLs.

[1] https://streams.spec.whatwg.org/#readable-byte-stream

Bug: 1243329
Change-Id: I381b9f2272a7f1202fa748ae5c039ca0a998de00
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 1, 2023
ReadableByteStream is a variant of ReadableStream specialized for
bytes[1]. Given the performance benefits, this CL adds BYOB support
for Fetch by making Response.body a byte stream to allow for reading
with a bring-your-own-buffer(BYOB) reader.

The corresponding spec PR for this was landed at whatwg/fetch#1593.
Tests for reading from Blob with a BYOB reader were factored out, as
support for that will be implemented in follow-up CLs.

[1] https://streams.spec.whatwg.org/#readable-byte-stream

Low-Coverage-Reason: Behavior changes are covered by WPTs (i.e. response-consume-stream.any.js).
Bug: 1243329
Change-Id: I381b9f2272a7f1202fa748ae5c039ca0a998de00
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 1, 2023
ReadableByteStream is a variant of ReadableStream specialized for
bytes[1]. Given the performance benefits, this CL adds BYOB support
for Fetch by making Response.body a byte stream to allow for reading
with a bring-your-own-buffer(BYOB) reader.

The corresponding spec PR for this was landed at whatwg/fetch#1593.
Tests for reading from Blob with a BYOB reader were factored out, as
support for that will be implemented in follow-up CLs.

[1] https://streams.spec.whatwg.org/#readable-byte-stream

Low-Coverage-Reason: Behavior changes are covered by WPTs (i.e. response-consume-stream.any.js).
Bug: 1243329
Change-Id: I381b9f2272a7f1202fa748ae5c039ca0a998de00
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 1, 2023
ReadableByteStream is a variant of ReadableStream specialized for
bytes[1]. Given the performance benefits, this CL adds BYOB support
for Fetch by making Response.body a byte stream to allow for reading
with a bring-your-own-buffer(BYOB) reader.

The corresponding spec PR for this was landed at whatwg/fetch#1593.
Tests for reading from Blob with a BYOB reader were factored out, as
support for that will be implemented in follow-up CLs.

[1] https://streams.spec.whatwg.org/#readable-byte-stream

Low-Coverage-Reason: Behavior changes are covered by WPTs (i.e. response-consume-stream.any.js).
Bug: 1243329
Change-Id: I381b9f2272a7f1202fa748ae5c039ca0a998de00
aarongable pushed a commit to chromium/chromium that referenced this pull request Jun 2, 2023
ReadableByteStream is a variant of ReadableStream specialized for
bytes[1]. Given the performance benefits, this CL adds BYOB support
for Fetch by making Response.body a byte stream to allow for reading
with a bring-your-own-buffer(BYOB) reader.

The corresponding spec PR for this was landed at whatwg/fetch#1593.

[1] https://streams.spec.whatwg.org/#readable-byte-stream

Low-Coverage-Reason: Behavior changes are covered by WPTs (i.e. response-consume-stream.any.js and Blob-stream.any.js).
Bug: 1243329, 1189621
Change-Id: I381b9f2272a7f1202fa748ae5c039ca0a998de00
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4573009
Commit-Queue: Nidhi Jaju <[email protected]>
Reviewed-by: Kent Tamura <[email protected]>
Reviewed-by: Adam Rice <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1152290}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements topic: streams
Development

Successfully merging this pull request may close these issues.

Use a ReadableStream with byte source (formerly called ReadableByteStream) for .body
4 participants