Should keys/values/entries be async iterators? #6

domenic · 2018-02-12T18:03:42Z

Right now they return promises for arrays. This can be convenient; once you await them, they're just normal arrays.

However, an async iterator would map better to the underlying IndexedDB technology (cursors which advance one by one), and would work a lot better for cases with many entries.

Should we make these return async iterators?

An alternate design would be to have them return promises-for-arrays, but also have keysIter/valuesIter/entriesIter as async-iteration versions.

The text was updated successfully, but these errors were encountered:

jakearchibald · 2018-02-13T17:16:13Z

Is there a proposal or something that makes it easy to turn an async iterator into an array? Array.asyncFrom() or Promise.asyncAll() or something.

domenic · 2018-02-13T19:47:22Z

Not at the moment. I'd be hestitant to propose such a thing because it encourages writing code that defeats the purpose of streaming. It's better to have people actually do the per-item processing in a streaming, async fashion using for-await-of.

jakearchibald · 2018-02-14T09:29:18Z

In that case I'd have keys/values/entries return a promise for an array, so they can use getAll under the hood, which I understand is faster than cursors.

Storage objects could implement async iterators for cursor-like behaviour:

for await (const [key, value] of storage) {
  // …
}

domenic · 2018-02-14T18:30:57Z

Oh, I didn't know about getAll. And very neat idea on just having Symbol.asyncIterator, instead of dedicated APIs.

inexorabletash · 2018-02-26T23:01:30Z

One wrinkle with getAll() - there is getAll() and getAllKeys() but no way to do both in a single call. Fortunately, with IDB's transaction semantics, if you issue both in sequence you're guaranteed get consistent results., i.e.

const r1 = store.getAllKeys();
const r2 = store.getAll();
r2.onsuccess = e => {
  // if r2 succeeds then r1 must have succeeded already.
  const [keys, values] = [r1.result, r2.result];
  ...
};

inexorabletash · 2018-02-26T23:06:09Z

As an aside, in case it's not clear: you won't be able to directly map cursors to async iterators for the reasons explored in https://github.com/inexorabletash/indexeddb-promises - the transaction would be kept alive during iteration until a non-IDB task slips into the microtask processing.

So if you want async iteration you'll want to snapshot (e.g. via getAll()) and implement the async-ness in userspace.

jakearchibald · 2018-02-27T10:03:47Z

Ah, yeah, in which case there isn't really any point in using async iterators.

domenic · 2018-02-28T02:13:48Z

Agreed. I guess that sadly closes this in the negative, at least until IDB gets the necessary primitives.

Per #6 (comment) we can just wait for the second request.

pwnall · 2018-10-17T08:36:22Z

If this is still on the table, I beg to differ -- I think we should do async iteration, because it gives us more room to evolve the implementation.

Specifically, async interation gives us the option to protect against excessive memory consumption and OOMs if the database grows a lot. Today, we can use getAll with a count and buffer some results (say, 1k - 10k), to offset the cost of having a transaction per read. If IndexedDB evolves better primitives, we can take advantage of them in future revisions.

domenic · 2018-10-17T08:57:11Z

Thanks for chiming in. I see the point and, with a more future-facing look, tend to agree. Let's reopen as a first step.

What should we do about naming? In all cases we should add [Symbol.asyncIterator](), but what about the keys/values/entries functions? A few options (not exhaustive):

Only supply [Symbol.asyncIterator](), which gives [key, value] pairs, alongside promise-for-array keys()/values()/entries()
Make keys()/values()/entries() return async iterators, and have separate keysArray()/etc. functions. (I'm not a big fan of this because it mismatches other async maps on the platform.)
Make keys()/values()/entries() return promises-for-arrays, and have separate keysIter()/etc. functions

asutherland · 2018-10-22T05:26:48Z

Why expose any promise-for-array variations? In w3c/ServiceWorker#1066 we saw a case where the naive "return everything in a single array" design for cache.keys() would break things once a lot of stuff was stored in a Cache. Exposing promise-for-array for symmetry with existing Map metaphors when we know it becomes a foot-gun as storage usage grows seems counterproductive.

Along these lines, async iterators give browsers the ability to help consumers of the API avoid tying up the main thread by breaking up the results into batches. API consumers will get what they want and what browsers want as part of the API.

And the IndexedDB escape-hatch allows those who really want everything all at once to fall back to IndexedDB and getAll/getAllKeys.

domenic · 2018-12-19T22:47:20Z

If we think other async maps on the platform have made a mistake, then departing from them and having keys()/values()/entries() return an async iterator makes sense to me. And yeah, I guess the more conservative thing is to omit the promise-for-array variants. I'll work on that spec change now.

domenic · 2018-12-20T00:51:27Z

I've put up a rough draft, so far only for keys(), at https://github.com/domenic/async-local-storage/compare/async-iterators?diff=split . I'd welcome comments on the specification style which is fairly unusual. (Probably some of it will migrate to Web IDL.)

On the side of substantial issues, one thing that came up is the tension between flexibility and predictability. In particular, what happens if you modify the database during iteration? Concretely, what should this do?

storage.set(1, "1!");
storage.set(2, "2!");

let i = 3;
for await (const k of storage.keys()) {
  console.log(k);
  storage.set(i, `${i}!`);
  ++i;
}

If you use the getAllKeys() strategy, this will log 1, 2, then stop. If you use a batching strategy with batch size 1, this will loop forever. If you use a batching strategy with batch size 2, it depends on whether you do your check-for-more-batches after grabbing the batch (loop will exit) or before a new batch is requested (loop will run forever). If you use a batching strategy with size 100, it will log 1, 2, then stop.

This seems pretty bad for predictability. Are we OK with saying that the results are undetermined between browsers in this way? Or should we lock this down more?

For example, we could mandate a set batch size. Or, we could mandate using getAllKeys() for now and hope that in the future if IndexedDB gets nice primitives then all browsers can switch at once, while keeping the same API shape. Or, we could completely get rid of keys()/values()/entries() from v1.

Or, maybe there's something cleverer that I haven't thought of.

asutherland · 2018-12-20T01:37:25Z

I hope the IndexedDB transaction model already covers this. keys() gets a "readonly" transaction that it uses for all retrievals. The set() call creates a "readwrite" transaction whose mutations will never be seen by the keys() "readonly" transaction per https://w3c.github.io/IndexedDB/#transaction-lifetime-concept. So the bits about "and perform a new transaction each time to retrieve the next key" would need to be removed.

I see a few wrinkles that need to be addressed with this, though:

The polyfill has to deal with the realities of self-closing transactions. So it needs to at least keep getting one more result value per call or the transaction closes and semantics break. This limits how lazy the poly-fill can be.
The spec would want to reasonably guarantee that code that calls storage.keys() but indefinitely defers processing the async iterator does not cause storage.set('foo', 1); storage.get('foo'); to indefinitely stall. Implementation-wise, this translates to "You must not keep the readonly transaction backing the async iterator alive if your transactions implementation isn't clever enough to keep things going; instead you need to slurp the data into memory." (Some implementations, like Firefox's IndexedDB-on-SQLite-WAL can potentially just rely on the semantics enabled by SQLite's WAL to let "readonly" transactions be long-lived.)
The spec would want to decide and explicitly state how an indefinitely deferred keys() consumption is addressed. This is necessary because an implementation doing something clever with long-lived "readonly" transactions will at some point still need to close out that transaction so WAL check-pointing can compact the database and stop the entire disk from filling up. I think the options are:
- The iterator is allowed to throw if you don't consume it in a timely fashion. This is hard because the obvious definition of "timely fashion" here is the IndexedDB self-closing transaction timeline, which is arguably unfriendly. (But does simplify the polyfill!) And anything longer than that is likely to be surprising to users or a nightmare for cross-browser consistency.
- The implementation is required to consume the iterator's values into memory when it would close the underlying "readonly" transaction so that the values can be provided when eventually desired. And if that would cause an OOM, it causes an OOM. I like this one.

domenic · 2018-12-20T01:53:18Z

Ah, so you'd use a single transaction, just potentially multiple key-retrieval calls? And for now that would mean you'd have to do all your key retrieval calls within one event loop turn, since IndexedDB transactions always auto-close right now. (And thus, probably you'd just do a single key retrieval call.)

But later, if IndexedDB gets better primitives for non-auto-closing, async local storage could layer on top of that, and the only difference would be the timing of the results---not their contents.

Am I understanding correctly?

inexorabletash · 2018-12-20T06:20:22Z

On vacation, so keeping this short: an alternative is iterating in user-space across transactions. Store the last key returned, and use an open-lower-bound range to get the next key/value, e.g.

const range = IDBKeyRange.lower(last_key, /*open=*/true);
const rk = tx.getKey(range);
const rv = tx.get(range);
rv.onsuccess = e => { 
   last_key = rk.result;
   if (last_key === undefined) { end_of_data(); }
   else { async_yieldify(rk.result, rv.result); }
};

The semantics roughly match what happens to a cursor that's iterating within a read/write transaction where the data changes in IDB today.

asutherland · 2018-12-20T12:57:51Z

@domenic Not quite as bad as one event loop turn. In order to keep the transaction active you need to dispatch another request in the same event loop turn that the last currently outstanding request succeeds/errors. So you would fetch batch N+1 once batch N comes back. This just means the the polyfill batch filling would be driven by how fast IDB can deliver records, not how fast the keys() caller wants to consume them.

Re: open-lower-bound range, that seems quite reasonable to me too. Since the values in ConvenienceStore are structured cloned rather than strings, if a consumer wants to atomically store two related values, they can just stick them in an object { foo, bar } keyed "fooBar" rather than storing them in separate keys "foo" and "bar". This would be consistent with the ordering of get("foo"); get("bar"); which would happen in 2 transactions. And I guess that also makes it clear that this is the right way to handle it. If a caller wants a snapshot, they can use IndexedDB on the store.

domenic · 2019-01-02T23:15:42Z

@inexorabletash's idea seems good to me. I'll work on prototyping that in Blink, and writing a bunch of tests for it, then I'll come back here with a spec patch.

domenic · 2019-01-03T19:36:54Z

Update: as far as I can tell, the only way to get the first key in IDB is to use cursors; get() or getKey() will not allow a null range argument, and there appears to be no way to construct an unbounded range using the IDBKeyRange factory methods. So, this will need a slightly different strategy, unless I'm missing something (which is quite possible).

vweevers · 2019-01-03T20:09:43Z

and there appears to be no way to construct an unbounded range using the IDBKeyRange factory methods

How about IDBKeyRange.lowerBound(-Infinity)?

asutherland · 2019-01-03T20:34:51Z

objectStore.getAll(null, 1) and objectStore.getAllKeys(null, 1) should work for values and keys, respectively. (Leveraging their query, count signatures.)

domenic · 2019-01-03T20:41:11Z

objectStore.getAll(null, 1) and objectStore.getAllKeys(null, 1) should work for values and keys, respectively. (Leveraging their query, count signatures.)

Yeah, that's what I ended up doing; I had to have separate getFirstKey/getFirstKeyValuePair and getKey/getKeyValuePair functions, which was a bit icky, but it all got abstracted over.

How about IDBKeyRange.lowerBound(-Infinity)?

That might be even better, thanks! I'll try it, once I get the tests passing against my branching version.

inexorabletash · 2019-01-07T19:36:15Z

Would pushing additional primitives into IndexedDB to make things easier here be reasonable? I don't believe it's a goal for this API to be polyfillable on any existing engines, is it?

re: IDBKeyRange.lowerBound(-Infinity) - see some discussions in w3c/IndexedDB#76 about potentially adding keys that sort before -Infinity. That said, we can evolve this API's definition and IDB in parallel, so it's not a problem as long as we remember to update both specs (or merge them as was suggested at TPAC by @asutherland maybe?)

Adding something like IDBKeyRange.unbounded() seems feasible - the spec already has the concept. Alternately, we could just change get() and getKey() to allow null which already means this for most queries, it's just explicitly forbidden for those calls. IIRC it was just an early decision to throw in that case to prevent developer confusion. Apart from tests, it should be web-compatible to relax that (chase down null disallowed flag mentions in the spec).

Also, adding getKeyAndValue() has been requested before, to avoid the API/resource overhead of a cursor. I'd be down with adding that too.

domenic · 2019-01-07T19:44:14Z

Would pushing additional primitives into IndexedDB to make things easier here be reasonable?

For sure, although there's a question of how far do we go: e.g. we could go as far as to add fully async-iterator friendly extensible transactions, or better cursors, or even async iterator cursors. My initial preference is to do things based on top of what's specced at this time, and co-evolve to simplify the layered ALS spec on top of any future IDB additions.

I don't believe it's a goal for this API to be polyfillable on any existing engines, is it?

Correct, just layered on top of the "theoretical web platform" provided by the stable-ish portion of the specification ecosystem.

merge them as was suggested at TPAC

Yeah, that's still the plan; need to move this to WICG, then when it's ready to graduate, merge into IDB.

Adding something like IDBKeyRange.unbounded() seems feasible - the spec already has the concept. Alternately, we could just change get() and getKey() to allow null which already means this for most queries, it's just explicitly forbidden for those calls. IIRC it was just an early decision to throw in that case to prevent developer confusion. Apart from tests, it should be web-compatible to relax that (chase down null disallowed flag mentions in the spec).

Also, adding getKeyAndValue() has been requested before, to avoid the API/resource overhead of a cursor. I'd be down with adding that too.

These all seem quite nice to me, but I'm not sure I'm the right person to drive the spec/test/implementation updates; as tempting as it is, I think just using what we have right now is the right path for my time investment. But, I'm happy to be convinced otherwise. Thoughts?

Following the spec discussion at WICG/kv-storage#6. This CL is used for prototyping and tests. Change-Id: If5262a32d42eb1c7abaad4fb2a029ce44e94834e

tabatkins · 2019-01-11T22:03:41Z

I can't tell from this discussion whether the impl strategy for async iterators would allow you to easily iterate over the "original" version of the db, and mutate it in the loop without seeing the changes. I think it's important for usability to make that case easy, even if it's not the easiest thing to do; having to drop down to IDB and understand the specifics of the transaction semantics (per @inexorabletash's example in #6 (comment)) is... less than good. (Or, equivalent in meaning but very different implementation-wise, do a read-only loop, building up all desired mutations in a side-list, then do a write-only loop, writing all the mutations to the db. Also v awkward.)

For eager iterators this is easy; for(const val of [...iter()]) {...} works just fine. It would be unfortunate if this was substantially more difficult for async iterators.

I might just be asking for Promise.asyncAll(), a la Jake's earlier request, and thus wouldn't actually require this API to do anything in particular. "You shouldn't do that" isn't a valid response to this request, I think; you shouldn't generally eagerly pull all the data out of an eager iterator, either, but it's sometimes useful.

inexorabletash · 2019-01-12T00:33:09Z

There's also...

navigator.locks.request('kvstorage', async lock => {
  for await (const [k,v] of kvstorage.entries()) {
    ...
  }
});

tabatkins · 2019-01-12T00:38:44Z

Ah, that as a slightly-easier way to iterate the db so I can synthesize it into a Promise.all() compatible form?

domenic · 2019-01-14T19:47:55Z

I agree there's generally a missing language-level primitive for converting an async iterator into a promise for an array. I'll start noodling on that. I did suggest maybe having something built-in for convenience in #6 (comment), but reactions seemed negative to making this "antipattern" too easy.

In the meantime, it is a pretty simple wrapper to write:

async function collectAsyncIterator(ai) {
  const result = [];
  for await (const x of ai) {
    result.push(x);
  }
  return result;
}

I don't really understand @inexorabletash's suggestion (what work is the navigation.locks lock doing?) or @tabatkins's response :)

tabatkins · 2019-01-14T21:57:34Z

@inexorabletash's suggestion was a simpler way to iterate the indexeddb directly; presumably with the assumption of a "kvstorage" lock that the kvStorage library also uses, so you can be sure you've consumed the whole thing without intervening edits.

But yeah, I guess the converter between async-iterable and Promise<Array> is pretty trivial.

inexorabletash · 2019-01-14T22:37:25Z

Apologies for being vague. I was attempting to address:

...easily iterate over the "original" version of the db, and mutate it in the loop without seeing the changes...

... and somewhat mis-parsed as wanting to be able to iterate without seeing changes coming from another context. On re-reading, I don't think that's what @tabatkins was getting at. Attempting to restate his use case using different terminology: iterate over a snapshot, while making changes during iteration that aren't reflected in the iteration.

My suggestion just uses locks to introduce a transaction so that the iteration isn't polluted by changes from another context (assuming all contexts wrap access by grabbing a lock), which is a solution to a different problem.

In general, the "operate on a snapshot while allowing mutations" is something databases provide (leveldb, sqlite, etc), although is expensive and not guaranteed, so e.g. IndexedDB doesn't guarantee it. In IDB, if you have a read-only transaction, you don't get to mutate. If you have a read-write transaction, you observe your own writes even prior to the transaction commit.

vweevers · 2019-01-14T23:57:53Z

In general, the "operate on a snapshot while allowing mutations" is something databases provide (leveldb, sqlite, etc), although is expensive [..]

At the risk of derailing this thread (is there a better place to discuss this?): asynchronously reading from a snapshot would be very nice to have. Might not be that expensive in Chrome at least, seeing as it uses LevelDB under the hood (and LevelDB snapshots are lightweight).

My suggestion just uses locks to introduce a transaction so that the iteration isn't polluted by changes from another context

@inexorabletash Relying on Web Locks would only work for small data sets, right? E.g. you wouldn't want to lock everything while slowly reading a million rows.

IndexedDB doesn't guarantee it.

@inexorabletash Can you expand on that? I was under the assumption that IndexedDB does offer "snapshot guarantees" on a read-only transaction. And (level-js) unit tests confirm this, in Chrome, FF and Opera (IE and Edge occasionally fail).

inexorabletash · 2019-01-15T01:00:21Z

(Apologies for the further derail. Happy to take this up in an issue on the IDB repo if more discussion is needed - https://github.com/w3c/IndexedDB)

Relying on Web Locks would only work for small data sets, right?

Probably not a great application architecture, no. (aside: Web Locks do support shared/exclusive)

I was under the assumption that IndexedDB does offer "snapshot guarantees" on a read-only transaction.

The spec details when transactions can be started; the rule is that a RO transactions should see consistent data, and it clarifies that this can be done by having RO transactions block R/W transactions, or RO transactions can snapshot so that subsequent R/W transactions can start. Currently, only Chrome snapshots -- in other browsers, R/W transactions are blocked until RO transactions finish -- but we're changing that as part of some architectural rework of the implementation.

Restating: the IDB spec guarantees that no changes are observed within a RO transaction, but allows that to be provided with or without snapshots.

inexorabletash · 2019-01-15T01:01:23Z

Also...

asynchronously reading from a snapshot would be very nice to have. Might not be that expensive in Chrome at least, seeing as it uses LevelDB under the hood (and LevelDB snapshots are lightweight).

IMHO, this is better tackled with even lower level primitives than kv-storage, e.g. allowing LevelDB in the browser via a WASM-friendly storage layer, so you literally run LevelDB and can tune it for your app's needs.

domenic · 2019-01-15T22:18:29Z

Update: I have an implementation I'm happy with, and a boatload of tests, over at https://chromium-review.googlesource.com/c/chromium/src/+/1396453 . It has pretty nice behavior for if you try to set values while iterating and such.

If anyone has more test suggestions let me know (here or there). I'll get back to working on a spec patch tomorrow, using that as the basis.

Closes #6.

domenic · 2019-01-16T23:44:05Z

First draft up at #47. I realized I have no tests for the error cases yet so writing those, and potentially fixing the spec if it handles it badly, is the next step.

Closes #6.

domenic added the api A proposal for changing the API surface label Feb 12, 2018

domenic closed this as completed Feb 28, 2018

domenic added a commit that referenced this issue Feb 28, 2018

Clean up entries() spec

52cf6e3

Per #6 (comment) we can just wait for the second request.

domenic mentioned this issue Jun 1, 2018

Wait for both getAll and getAllKeys to resolve #35

Closed

domenic reopened this Oct 17, 2018

domenic mentioned this issue Dec 20, 2018

Async iterators whatwg/webidl#580

Closed

domenic mentioned this issue Jan 14, 2019

More suggested examples tc39/proposal-iterator-helpers#3

Closed

domenic mentioned this issue Jan 16, 2019

Use Web IDL, while preserving our goals #46

Closed

domenic added a commit that referenced this issue Jan 16, 2019

Change keys()/values()/entries() to return async iterators

712f91d

Closes #6.

domenic mentioned this issue Jan 16, 2019

Change keys()/values()/entries() to return async iterators #47

Merged

domenic closed this as completed in #47 Jan 18, 2019

domenic added a commit that referenced this issue Jan 18, 2019

Change keys()/values()/entries() to return async iterators

2432509

Closes #6.

domenic mentioned this issue Jan 27, 2019

ReadableStream should be an async iterable whatwg/streams#778

Closed

chocolateboy mentioned this issue Mar 12, 2019

Expose Iterables jaredwray/keyv#40

Closed

domenic mentioned this issue Mar 13, 2019

Performance concerns #57

Open

Should keys/values/entries be async iterators? #6

Should keys/values/entries be async iterators? #6

Comments

domenic commented Feb 12, 2018

jakearchibald commented Feb 13, 2018

domenic commented Feb 13, 2018

jakearchibald commented Feb 14, 2018

domenic commented Feb 14, 2018

inexorabletash commented Feb 26, 2018

inexorabletash commented Feb 26, 2018 • edited Loading

jakearchibald commented Feb 27, 2018

domenic commented Feb 28, 2018

pwnall commented Oct 17, 2018

domenic commented Oct 17, 2018 • edited Loading

asutherland commented Oct 22, 2018

domenic commented Dec 19, 2018

domenic commented Dec 20, 2018

asutherland commented Dec 20, 2018

domenic commented Dec 20, 2018

inexorabletash commented Dec 20, 2018

asutherland commented Dec 20, 2018

domenic commented Jan 2, 2019

domenic commented Jan 3, 2019

vweevers commented Jan 3, 2019

asutherland commented Jan 3, 2019

domenic commented Jan 3, 2019

inexorabletash commented Jan 7, 2019

domenic commented Jan 7, 2019

tabatkins commented Jan 11, 2019

inexorabletash commented Jan 12, 2019

tabatkins commented Jan 12, 2019

domenic commented Jan 14, 2019

tabatkins commented Jan 14, 2019

inexorabletash commented Jan 14, 2019

vweevers commented Jan 14, 2019

inexorabletash commented Jan 15, 2019

inexorabletash commented Jan 15, 2019 • edited Loading

domenic commented Jan 15, 2019

domenic commented Jan 16, 2019

inexorabletash commented Feb 26, 2018 •

edited

Loading

domenic commented Oct 17, 2018 •

edited

Loading

inexorabletash commented Jan 15, 2019 •

edited

Loading