Generalized and precise flow/buffer control #301

tyoshino · 2015-03-18T05:34:50Z

Before recent redesign, we always had a queue in the JS world behind a readable stream, and we calculated how to exert backpressure to the "underlying source" based on the size of the queue.

We're now separating the streams API semantics visible to the consumer (mainly implemented in the reader) and a queue behind it which doesn't always exist (depending on who (JS, C++ code, kernel, device, etc.) buffers data). This made me think more generally, that is, there're many points of buffering behind streams. The number of such points depends on which API we're using, which platform we're using, etc.

It sounds like both the high water mark & strategy concept, and the precise flow control stuff I proposed only make sense for streams with a queue in JS world. It seems it's hard to well-define queueSize, highWaterMark, pullAmount, etc. as cross-platform, cross-API feature.

To put it differently, for non-JS-queue-backed APIs, what do we guarantee for the consumer when HWM is set to X?

I still guess some kind of best-effort approach is still useful. The UA tries to keep the number of bytes buffered on the path between the ultimate data source (disk, the peer server, etc.) and the JS world. As Domenic originally suggested, the strategy itself should be chosen intelligently by the UA considering memory size, CPU power, browser architecture, etc. etc. and it's responsible only for computing how much data is buffered in the JS world. Combining buffer size info (C++, kernel, JS, ...), the UA would intelligently limits the amount of data to pull.

And, the pullAmount or something in the proposed precise flow control APIs will just give a hint about how quickly the consumer can consume data in the form of the number of bytes. It is not directly connected with such as the size of the queue in JS. The stream utilizes the info using some algorithm to control amount of data to pull.

The text was updated successfully, but these errors were encountered:

tyoshino · 2015-03-18T05:45:01Z

As Yutaka has been advocating, we should stop attempting to specify direct connection between the "underlying source" which the JS code faces and the readable (byte) stream operation. Not directly, but just have a good API surface that could be utilized for optimization if the implementor of the readable (byte) stream wants. Bring your own buffer style reader allows the implementor to directly connect read(view) with ReadFile, etc, but it's not required. As we discussed offline, the JS library may be requested to drain data from underlying source into an ArrayBuffer in JS given some consuming pressure. In such case, we don't pass the view provided by the consumer to ReadFile, but copies data already read out to the provided view.

domenic · 2015-03-18T07:14:23Z

I agree with the first post. In #294 (comment) I summed it up in terms of a proposed .getReader({ queueSize: x }) API. And yeah, it is best-effort, with the underlying source implementation taking it into account to decide what to actually do.

On the second post, I also agree that there should be no necessity of connecting the view directly to ReadFile or similar. But just in case it wasn't clear, I think it's important to design the stream + underlying source constructor parameter combo so that you could code the underlying source purely in JS. That is, let's say we had almost-direct JavaScript bindings of syscalls (like this for files or this for sockets). Good underlying source code would still have to do more work, e.g. if the socket buffer is overflowing then the underlying source needs to copy into user-space and then later instead of reader.read() going directly to the syscall it copies the previously-saved buffer. But the important thing is that you could in theory express it all in JS using the underlying source constructor parameter + appropriate syscall bindings. So in this world e.g. underlyingSource.read(view) would have code to check any previously-copied stuff, and if not would call through to the syscall. Make sense?

tyoshino · 2015-03-18T08:36:11Z

#294

Oh, yes. Sorry I forgot to cite that comment.

I think it's important to design the stream + underlying source constructor parameter combo so that you could code the underlying source purely in JS. ...

Agreed. Thanks for clarification.

domenic · 2015-03-20T06:39:59Z

My current thinking based on discussions this morning:

pull(enqueue, close, error) becomes pull(enqueue, close, error, spaceLeftGetter) in functionality. (In name and shape, maybe pull({ enqueue(), close(), error(), desiredSize })? Can be settled afterward though.)
Strategies need to be modified to have high water mark directly instead of having flexible shouldApplyBackpressure predicate. (Maybe we allow it to be updated, but unsure by who or how.)
The when-should-we-call-pull logic stays similar to how it is now, with one addition:
- After start
- If queue size changes, up or down, and either:
  - spaceLeft = HWM - queueSize > 0, or
  - there are pending reads
- Promise-debounced

With this change, and a strategy that says HWM = 0, read() will directly correspond to pull(), which is good. (The "if there are pending reads" ensures it.)

With a strategy that has HWM > 0, then pull will be called whenever the queue size is less than the high water mark, and it will be called with spaceLeft (or "desiredSize" maybe) = HWM - current queue size, which allows it to make intelligent decisions about how to generate chunks that it will enqueue.

tyoshino · 2015-03-23T08:11:34Z

Some logs of the last week's discussion

For the initial release, read method is the only way for the consumer to tell the stream how fast the consumer is / how much the consumer is fine with getting buffered. Readable stream doesn't expose setHWM to the consumer.
- [tyoshino] Re: naming, I prefer pullAmount (or something similar) to HWM as HWM sounds like assuming that there's a queue.
- [tyoshino] If we add a similar method to the writable side, it should use "space" not "queueSize" for its name. We have many queues behind a writable stream (JS, C++, kernel, device, ...). If we say "queue", we need to tell what the "queue" means and forces the consumer to think of the details behind the writable stream.

Instead of creating enqueue, close, and error functions per-stream, and passing (some subset of them) to the underlying source's start() and pull() methods, we instead create an instance of the ReadableStreamController class, which has methods enqueue(), close(), and error(). This closes #309, which contains more discussion and justification. It sets the stage for #301 (which will work by adding another property to the ReadableStreamController class).

domenic · 2015-04-04T00:04:29Z

I'm working on this in the precise-flow-control branch. It's going pretty well. Gotta update some more tests (especially around writable streams) and the spec, but shouldn't be too bad.

I went with desiredSize for the name for now. It seems OK. Other candidates included extraSpace or (space|size)UntilFull.

This changes the way queuing strategies work, and how backpressure signals are propagated from readable stream controllers, in the service of giving more precise information to be used for flow control. Queuing strategies now explicitly embrace the concept of "high water mark," instead of having a general `shouldApplyBackpressure` method. This allows us to calculate the "desired size to fill a stream's internal queue", and expose it as a `desiredSize` property on the readable stream controller. The algorithm for determining whether to call pull was updated, per #301 (comment), to no longer be based on the boolean "should apply backpressure" logic. The logic is now to avoid calling pull if desired size is <=0 (unless there are pending read requests, in which case pull should be called anyway). In particular, this modifies test expectations in some cases where start enqueues chunks; after this change pull is no longer automatically called after start in such cases. The API changes in detail: - The readable and writable stream constructors now validate the passed queuing strategy and extract its values. - Queuing strategies are no longer treated as instances of classes, but instead as property bags. (That is, we no longer "invoke" `strategy.size`, but instead simply extract the `size` function from strategy and "call" it.) - The built-in queuing strategies no longer validate the passed high water mark argument and store it in an internal slot. Instead they simply define a data property on themselves whose value is the passed high water mark. - Queuing strategies no longer have `shouldApplyBackpressure` methods. - `ReadableStreamController.prototype` gained a `desiredSize` getter. - `ReadableStreamController.prototype.enqueue` (and EnqueueInReadableStream) no longer return a boolean to signal backpressure; instread consult `controller.desiredSize` for more precise information. The correspondence is: `enqueueReturnValue === controller.desiredSize > 0`. Note that writable streams do not yet expose the desired size. We expect to do that later, probably in conjunction with related changes. For now, they translate the desired size into a boolean distinction between "waiting" and "writable", giving the same results as before. Fixes #301. Addresses half of #119.

tyoshino mentioned this issue Mar 18, 2015

Byte streams tracking bug #300

Open

21 tasks

domenic added the queuing strategies label Mar 18, 2015

domenic modified the milestone: Week of 2015-03-30 Mar 30, 2015

domenic mentioned this issue Mar 30, 2015

Making the "queue manipulator" its own class #309

Closed

domenic mentioned this issue Mar 31, 2015

Introduce readable stream controller class #310

Merged

domenic modified the milestones: Week of 2015-03-30, Week of 2014-04-06 Apr 7, 2015

domenic self-assigned this Apr 7, 2015

domenic mentioned this issue Apr 7, 2015

Do writable streams need separate signals for write-success vs. write-accepted? #316

Closed

domenic closed this as completed in 966f19e Apr 20, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalized and precise flow/buffer control #301

Generalized and precise flow/buffer control #301

tyoshino commented Mar 18, 2015

tyoshino commented Mar 18, 2015

domenic commented Mar 18, 2015

tyoshino commented Mar 18, 2015

domenic commented Mar 20, 2015

tyoshino commented Mar 23, 2015

domenic commented Apr 4, 2015

Generalized and precise flow/buffer control #301

Generalized and precise flow/buffer control #301

Comments

tyoshino commented Mar 18, 2015

tyoshino commented Mar 18, 2015

domenic commented Mar 18, 2015

tyoshino commented Mar 18, 2015

domenic commented Mar 20, 2015

tyoshino commented Mar 23, 2015

domenic commented Apr 4, 2015