Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage-free streaming fetch into a WebAssembly (Shared)ArrayBuffer? #1057

Closed
juj opened this issue Jul 21, 2020 · 4 comments
Closed

Garbage-free streaming fetch into a WebAssembly (Shared)ArrayBuffer? #1057

juj opened this issue Jul 21, 2020 · 4 comments

Comments

@juj
Copy link

juj commented Jul 21, 2020

Looking at the BYOB reading example at https://streams.spec.whatwg.org/#example-manual-read-bytes , the code looks awkward because of the large number of JS garbage that is generated. High performance animation-heavy WebAssembly applications strive to operate garbage-free, because JS GC pressure is known to generate microstuttering in WebGL rendering/animation.

Would it be possible to create an API that would enable garbage-free streaming into a(n) (Shared)ArrayBuffer?

The BYOB example on the page looks like

const reader = readableStream.getReader({ mode: "byob" });

let startingAB = new ArrayBuffer(1024);
readInto(startingAB)
  .then(buffer => console.log("The first 1024 bytes:", buffer))
  .catch(e => console.error("Something went wrong!", e));

function readInto(buffer, offset = 0) {
  if (offset === buffer.byteLength) {
    return Promise.resolve(buffer);
  }

  const view = new Uint8Array(buffer, offset, buffer.byteLength - offset); // (* garbage *)
  return reader.read(view).then(newView => { // ( * also garbage? *)
    return readInto(newView.buffer, offset + newView.byteLength);
  });
}

The stream avoids extra copy, but generates garbage. Comparing to a non-BYOB variant for a fetch:

function downloadFile(url) {
	return fetch(url)
	.then(response => {
		return [response.body, new Uint8Array(response.headers.get('Content-Length'))];
	}).then(body => {
		var reader = body[0].getReader();
		var buf = body[1];
		var totalDownloaded = 0;
		function onChunkDownloaded(data) {
			if (data.value) {
				buf.set(data.value, totalDownloaded); // (* excess memcpy *)
				totalDownloaded += data.value.length;
			}
			if (data.done) {
				return buf;
			} else {
				return reader.read().then(onChunkDownloaded);
			}
		}
		return reader.read().then(onChunkDownloaded);
	});
}
downloadFile('myLargeFile.dat').then(buf => {
	console.log('Downloaded file:');
	console.dir(buf);
});

This code does not generate excessive typed array views, but it has an extra memory copy.

I wonder if there would be a way to get to a JS GC free + zero copy way of downloading files?

@MattiasBuelens
Copy link
Collaborator

This code does not generate excessive typed array views

While your code does not construct extra typed array views, the stream itself definitely does. Every data.value is a new Uint8Array, which will be garbage collected after your onChunkDownloaded returns.

Moreover, the Uint8Arrays constructed by the stream will all have their own separate backing ArrayBuffer (in view.buffer). And this is precisely where most of the overhead comes from: while a typed array like Uint8Array is fairly lightweight (it merely points to a slice of data), an ArrayBuffer needs to allocate its backing memory to actually hold the data.

With a BYOB reader, you construct a new ArrayBuffer (and its backing memory) once, and then you let the stream fill in a view on that buffer. Yes, the stream has to construct a new view when it returns the result (because it might not have filled the whole view), but again: these views are lightweight.

I wonder if there would be a way to get to a JS GC free + zero copy way of downloading files?

I doubt you can eliminate all garbage collection, and I don't think that should be a goal. Even if you were to eliminate the typed arrays by making the API closer to something like C++'s fread (e.g. read(buffer, offset, length)), that call would still return a Promise - which will also need to be GC'd after it resolves.

Trying to get rid of all these small objects would make the API much harder to use, since you'd have to get rid of a lot of useful abstractions like typed arrays and promises. I'd argue that this would feel "alien" for most JavaScript developers.

@ricea
Copy link
Collaborator

ricea commented Jul 21, 2020

I wonder if there would be a way to get to a JS GC free + zero copy way of downloading files?

At least in Chrome, our goal is not to eliminate GC but to use generational GC so that as long as there is only a small number of short-lived objects generated, you won't see any GC pauses. You can track the progress of this effort here: https://bugs.chromium.org/p/chromium/issues/detail?id=1029379.

The streams standard creates lots of short-lived promises internally. It's theoretically possible to eliminate those promises for platform-generated streams, but in practice we don't because it would add a lot of complexity to the implementation. Instead we rely on the lower-level GC to be efficient.

I don't know about other browsers, but I suspect they are also working on ways of letting JavaScript interface with WebAssembly while avoiding GC pauses from temporary objects.

@domenic
Copy link
Member

domenic commented Jul 21, 2020

See also #757

@juj
Copy link
Author

juj commented Jul 28, 2020

Thanks for the replies. The reasoning makes sense, although expecting lower-level GC to be efficient is a lost battle for Wasm real-time rendering applications. Fortunately pages will not need to download data all the time, so any stuttering will be limited to periods of data download, so I suppose the API convenience/simplicity will be worth it.

@juj juj closed this as completed Jul 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants