Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What should we call ReadableByteStream.prototype.getBYOBReader()? #294

Closed
domenic opened this issue Mar 11, 2015 · 25 comments
Closed

What should we call ReadableByteStream.prototype.getBYOBReader()? #294

domenic opened this issue Mar 11, 2015 · 25 comments
Assignees

Comments

@domenic
Copy link
Member

domenic commented Mar 11, 2015

Per the lengthy discussions in #253 the new shape of ReadableByteStream will be something like:

  • getReader() returns a reader that, much like the "normal" ReadableStream, has automatically flowing auto-allocated chunks. So, rbs.getReader().read() will take no arguments and will auto-allocate a new Uint8Array each call.
  • Some other function, with straw-person name getBYOBReader(), returns a reader that requires you to feed it buffers. So, you call rbs.getBYOBReader().read(myBuffer) and it reads into myBuffer. (After doing appropriate detaching/transferring to ensure no observable data races, etc., as discussed extensively in Support reading bytes into buffers allocated by user code on platforms where only async read is available #253 and elsewhere.)

Here "BYOB" stands for "bring your own buffer," by the way.

Ideas for names:

  • getBYOBReader()
  • getManualReader()
  • getBufferedReader()
  • getFedReader()
  • getManualFeedReader()
  • getControlledAllocationReader()
  • getReadIntoReader()
  • getBytesReader() (although the normal reader also gives you bytes)
  • getPullingReader() (referring to how generally no reading is done until you call read(view), i.e. data is "pulled")
  • getReader({ feedBuffers: true }) or similar (introduces some kinda-strange return type variance though)

Are there any precedents from other platforms we can draw on, just for vocabulary if nothing else?

@yutakahirano
Copy link
Member

Naming after reader's type name looks natural to me.

@domenic
Copy link
Member Author

domenic commented Mar 12, 2015

Well, but do you have an idea for the reader's type name? :)

@yutakahirano
Copy link
Member

[Readable]BytesStreamReader and getBytesReader, perhaps?

@domenic
Copy link
Member Author

domenic commented Mar 12, 2015

Yeah that kind of works, since it's about specialized behavior for byte streams. But then again, rbs.getReader().read() will return bytes too, hmm. I'll add it to the list :D

@chrisdickinson
Copy link

What about inverting it, such that getReader is the BYOB version, while the auto-allocating version becomes getAllocatingReader?

@domenic
Copy link
Member Author

domenic commented Mar 12, 2015

@chrisdickinson that's not as good, I think, because ideally getReader() on both ReadableStream and ReadableByteStream should return an auto-allocating readers with a no-argument read().

@tyoshino
Copy link
Member

In offline discussion last Thursday, we categorized the readers we've been designing using a matrix.

Alloc / Pull Auto Manual
BYOB - A
Auto B C

(BYOB && Auto pull) doesn't make sense. So, it's marked -.

We're going to at least provide factory methods for A, B.

The "Alloc: auto" row also applies to object streams. Both of B and C make sense. I think C means that pulling one element.

For byte streams, I realized that I don't know what we should do for C well. C is manual pull. We're going to pull some bytes. How many? The user needs to call read() to pull bytes though they cannot specify the amount of data to pull. Domenic, what do you think about C? Is this kinda method to say "please read now, but decide how much to pull by yourself" to the reader?


So, if A, B and C make sense for byte streams, and B and C make sense to object streams, how should we organize the factory methods?

  • Byte stream
    • getBYOBManualPullReader: A
    • getAutoPullReader: B
    • getManualPullReader: C
  • Object stream
    • getAutoPullReader: B
    • getManualPullReader: C

Gave verbose names for the first step.

@domenic
Copy link
Member Author

domenic commented Mar 16, 2015

Domenic, what do you think about C? Is this kinda method to say "please read now, but decide how much to pull by yourself" to the reader?

Yeah, exactly. This would probably be done intelligently on a per-stream basis. IIRC libuv has heuristics for this (since Node streams are not BYOB), @piscisaureus mentioned them to me once.

It also says, "I don't care about reusing buffers, and will let the JavaScript GC take care of freeing the memory".


I am starting to think that an options-object accepting getReader() may be better. In case we add more dimensions of variability in the future, it would avoid a combinatorial explosion in method names. Let us think what that would look like:

  • Byte stream
    • getReader({ feedBuffers: true, flowing: true }): throws
    • getReader({ flowing: true }): B
    • getReader({ feedBuffers: true }): A
    • getReader(): C
  • Object stream
    • getReader({ flowing: true }): B
    • getReader(): C

Or maybe we want the default to be auto-pulling, so instead of flowing: true it is manualPull: true. (Better name?) Then we have

  • Byte stream
    • getReader({ feedBuffers: true, manualPull: true }): A
    • getReader({ manualPull: true }): C
    • getReader({ feedBuffers: true, manualPull: false }): throws
    • getReader(): B
  • Object stream
    • getReader({ manualPull: true }): C
    • getReader(): B

The interesting thing about object streams B vs. C is that previously we kind of let this be controlled by the strategy instead of by the consumer. (Although even then I think we always would be trying to pull at least some before exerting backpressure.) We could move that all to configuration to getReader?

Wait, but how does C work for object streams that are wrapping push sources? I think the manualPull: true only makes sense for: (a) APIs that are inherently pull-based like read(2)-on-files; (b) APIs that allow us to use the kernel buffer to store data (but that only goes up to a limit before we start losing data...) like recv(2)-on-sockets. But it does not work for APIs like web sockets that are pushing data at you via events and you will lose the data if you don't react...

@domenic
Copy link
Member Author

domenic commented Mar 16, 2015

Upon further reflection I am not sure auto-pulling vs. manual-pulling is a property of the reader. It may be more of a property of the stream. If it is a property of the reader, you should be able to get an auto-pulling reader, then release it and get a manual-pulling one. Or vice versa. That seems strange and potentially hard to implement.

@tyoshino
Copy link
Member

OK. Maybe "auto-pull" concept doesn't make sense when discussing a reader interface. What's going on behind a reader is unknown and abstracted. We basically shouldn't mention about anything behind it when naming the methods.

Pulling vs. non-pulling (give me when available) is the right way of comparison?

@domenic
Copy link
Member Author

domenic commented Mar 16, 2015

Tentative plan per afternoon discussions:

  • Use options object to allow greater future flexibility

  • Instead of "manualPull: false/true", use a queueSize argument, for both object and byte streams. If it is zero, then pulling is manual. (If feedBuffers is true, it must be zero.) If it is nonzero n, it communicates two things:

    • The consumer would prefer that, on a best-effort basis, the stream have n bytes available to fulfill future .read() promises quickly. (This means for sources like files, the underlying source should pull enough bytes proactively to meet n.) If n is 10 MiB, then that might mean a single .read() promise gets fulfilled with a 10 MiB chunk, or the next 10 .read() promises get fulfilled with 1 MiB chunks, depending on which is more convenient for the stream implementer.
    • The consumer would prefer not to consume much more memory than n. If the stream is getting close to or exceeding this threshold, then the stream should apply backpressure (or stop pulling proactively, for a case like files).
  • To make nonzero queueSize work, an underlying source must implement the pull()method---if that is not implemented, trying to pass queueSize will throw.

  • To make feedBuffer: true work, an underlying source must implement the read() method---if that is not implemented, trying to pass feedBuffers will throw.

  • Cases like a ReadableStream wrapping a WebSocket (i.e., wrapping an API that does not support backpressure signals) will not implement underlying source pull(), causing queueSize to never work. Instead a simple .getReader() call must be used, since we should not allow consumers to signal a desired queueSize if there is no way to control it.

  • Strategy will probably change. The most likely path looks to be getting rid of shouldApplyBackpressure and moving size() to the underlying source directly.

  • It would be beneficial to get the more precise backpressure APIs prototyped in Prototyping several changes for better support for reading bytes #287, e.g. maybe enqueue() should return the size left in the queue or similar. We should work on this.

  • The implementation may get complicated in how we allow scenarios like

    const reader = rbs.getReader({ queueSize: 1024 });
    
    // later, after the queue has filled up 
    reader.releaseLock();
    const reader2 = rbs.getReader({ feedBuffers: true });
    reader2.read(new Uint8Array(512));

    This kind of exotic scenario will probably necessitate copying (at least if implemented in JavaScript; UAs might be able to get away with better tricks for their internal queues).

@domenic
Copy link
Member Author

domenic commented Jun 18, 2015

@tyoshino I am coming back to this issue several months later and it seems we kind of forgot about a lot of the ideas in #294 (comment). Given how our thinking has evolved, e.g. in #329 and #361, what are your thoughts on that comment?

@tyoshino
Copy link
Member

tyoshino commented Jul 3, 2015

Thanks for reminder, Domenic. I'd like to begin with responding to #294 (comment)

Use options

This is still true, I think. Less methods sounds better basically. E.g. if we decide to implement various readers as I listed in #329 (comment), the option takes mode: "borrowing", etc.

manualPull: false/true

I'm gonna investigate if I can integrate the strategy mechanism with the up-to-date RBS (has pull/pullInto) well.

Cases like a ReadableStream wrapping a WebSocket ...

BYOB of the up-to-date RBS should just work (though involves copy).

@domenic
Copy link
Member Author

domenic commented Jul 6, 2015

I have spun off a lot of that stuff into #375. That leaves this thread for its original purpose, bikeshedding the name :).

So you think it should be

rbs.getReader({ mode: "byob" });
rbs.getReader({ mode: "default" }); // or... "pull"?
rbs.getReader(); // equivalent to default

The other alternative we were thinking was

rbs.getReader({ feedBuffers: true });
rbs.getReader({ feedBuffers: false });
rbs.getReader(); // equivalent to false

but I think that you are right that future expansion is better served by a string enum.

@tyoshino
Copy link
Member

tyoshino commented Jul 7, 2015

Ah, good point. Yeah, explicit parameters (boolean) for each feature is cleaner, but with mode string we can list only meaningful combinations and maybe robust for future extensions.

@tyoshino
Copy link
Member

ReadableByteStream has been merged into ReadableStream (#379). Now ReadableStream has both getReader and getBYOBReader().

It's time to resume discussion on this issue and finalize the BYOB API surface.

@domenic
Copy link
Member Author

domenic commented Mar 28, 2016

I think your arguments for getReader() + getReader({ mode: "byob" }) are still pretty good. Should we move to that?

@tyoshino
Copy link
Member

Reviewed the discussion in this thread. Yes, I still like the plan to add options to getReader().

Regarding the name, I think "buffer-provided" and "buffer-fed" are the candidates I'm still considering in addition to "byob".

  • mode: "default"
  • mode: "byob"
    • ReadableStreamBYOBReader
  • mode: "buffer-fed"
    • ReadableStreamBufferFedReader
  • mode: "buffer-provided"
    • ReadableStreamBufferProvidedReader

BTW, the differentiation between "auto pull" and "manual pull" discussed around #294 (comment) has been well depicted in the new examples written by Domenic.

#430 (comment) is also related.

In the latest spec, we chose to have the controllers expose queuing status via the desiredSize() getter to delegate pulling decision to the underlying source. Auto/manual pull is no longer part of the readable stream (and controller / reader) API but characteristics of underlying sources.

@tyoshino
Copy link
Member

Now I feel that ReadableStreamBYOBController is not the best naming for the controller with the BYOB feature. It has facilities for BYOB reading, but can also work with the default reader. Its main and essential characteristic is that it handles bytes (expects bytes, copies and repackages bytes).

So, I propose:

  • rename ReadableStreamBYOBController to ReadableStreamBytesController (similar to one of the candidates in Merge ReadableByteStream into ReadableStream #418 (comment))
  • renaming the "byob" parameter of the ReadableStream constructor to (source|controller)?(Type|Mode)
    • let the default value to be "default",
    • when its value is "bytes",

@domenic
Copy link
Member Author

domenic commented Mar 29, 2016

Hmm. I still like keeping BYOB in those names, because BYOB is the new capability on top of the default. I think it would be OK to keep it. I think of the byob: true option to the underlying source to be something like byobCapable: true (just shorter). I agree that its essential characteristic is that it handles bytes, but it's main new capability is a BYOB API.

Among byob, buffer-fed, and buffer-provided I think byob is pretty good. Maybe it is just because we have been staring at that for a few months though. I think buffer-fed is better than buffer-provided probably.

@tyoshino
Copy link
Member

Hmm. I still like keeping BYOB in those names, because BYOB is the new capability on top of the default. ...

In addition to #294 (comment), I'm thinking about the case in which we need to introduce the read() with ack functionality. If we introduce the feature, we would need to also add a new interface to the controller. Then, the controller will gain one more new feature in addition to BYOB. Considering such possibility, I was leaning toward giving more general name to it. Of course, we could just add a new controller. WDYT?

Among byob, buffer-fed, and buffer-provided I think byob is pretty good. Maybe it is just because we have been staring at that for a few months though. I think buffer-fed is better than buffer-provided probably.

OK. Let's go in "byob".

@domenic
Copy link
Member Author

domenic commented Apr 1, 2016

If we introduce the feature, we would need to also add a new interface to the controller. Then, the controller will gain one more new feature in addition to BYOB. Considering such possibility, I was leaning toward giving more general name to it. Of course, we could just add a new controller. WDYT?

Hmm, it is hard to say. Whether we introduce a new controller or not would depend on how different the internal state tracking would be.

I guess I see two options:

  • Go with BYOB, and maybe in the future change if we decide that the same controller needs more responsibilities, we can probably rename it since it is not very web-exposed
  • Avoid any possibility of future conflicts by just making the change to ReadableStreamBytesController (ReadableByteStreamController is probably better?) now.

I am OK with either.

So the total changes being contemplated here are:

  • Merge getBYOBReader into getReader using { mode: "byob" } as an argument
  • (Maybe) rename ReadableStreamBYOBController to ReadableStreamBytesController or ReadableByteStreamController

But no renaming for ReadableStreamBYOBReader or ReadableStreamBYOBRequest or for the underlying source option (byob: true). Right?

@tyoshino
Copy link
Member

tyoshino commented Apr 4, 2016

Thanks.

... depend on how different the internal state tracking would be.

Yeah. I can't come up with estimation of the complexity it would add for now.

... it since it is not very web-exposed ...

I see.

  • Merge getBYOBReader into getReader using { mode: "byob" } as an argument

Yes!

  • (Maybe) rename ReadableStreamBYOBController to ReadableStreamBytesController or ReadableByteStreamController
    But no renaming for ReadableStreamBYOBReader or ReadableStreamBYOBRequest or for the underlying source option (byob: true). Right?

I thought that if we rename ReadableStreamBYOBController, the underlying source option should also be renamed as it's an option telling what type of controller it needs or what set of reading operations the stream should expose.

But yeah, we could just add a new parameter say "bufferLeasing: true" in addition to byob.

So, ok.

@domenic
Copy link
Member Author

domenic commented Apr 5, 2016

I thought that if we rename ReadableStreamBYOBController, the underlying source option should also be renamed as it's an option telling what type of controller it needs or what set of reading operations the stream should expose.

Hmm, I see. I think the real question is: do we expect a 1:1:1 underlying source types <-> controller types <-> reader types. I guess we should not necessarily expect that. Right now we have 1:1 underlying source types <-> controller types, and that will probably stay forever. But the correspondence with reader types is already not 1:1 since you can get both default and byob readers from (what is currently called) a ReadableStreamBYOBController.

So, OK. We should reflect this asymmetry in the API. New plan:

  • Merge getBYOBReader into getReader using { mode: "byob" } as an argument
  • Rename ReadableStreamBYOBController to ReadableByteStreamController
  • Change the byob: true switch to type: "bytes"

(Maybe there is a better word than type, but I thought it was important to have different terms, so we could say that a given type of controller/stream/underlying source can have different modes for reading from it.)

tyoshino added a commit that referenced this issue Apr 6, 2016
tyoshino added a commit that referenced this issue Apr 6, 2016
tyoshino added a commit that referenced this issue Apr 6, 2016
@tyoshino tyoshino self-assigned this Apr 6, 2016
@tyoshino
Copy link
Member

tyoshino commented Apr 6, 2016

Yeah, in the last comment of mine, I thought we could have the ReadableStream constructor parameters each corresponds to enabled getReader() mode, but I prefer your idea. Yes, 1:1 correspondence between underlying source types and controller types should be stable for long time.

Please review the PR.

domenic pushed a commit that referenced this issue Apr 8, 2016
domenic pushed a commit that referenced this issue Apr 8, 2016
Instead of `byob: true`, it's now `type: "bytes"`. Partially fixes #294.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants