-
Notifications
You must be signed in to change notification settings - Fork 18
Should keys/values/entries be async iterators? #6
Comments
Is there a proposal or something that makes it easy to turn an async iterator into an array? |
Not at the moment. I'd be hestitant to propose such a thing because it encourages writing code that defeats the purpose of streaming. It's better to have people actually do the per-item processing in a streaming, async fashion using for-await-of. |
In that case I'd have keys/values/entries return a promise for an array, so they can use Storage objects could implement async iterators for cursor-like behaviour: for await (const [key, value] of storage) {
// …
} |
Oh, I didn't know about getAll. And very neat idea on just having Symbol.asyncIterator, instead of dedicated APIs. |
One wrinkle with const r1 = store.getAllKeys();
const r2 = store.getAll();
r2.onsuccess = e => {
// if r2 succeeds then r1 must have succeeded already.
const [keys, values] = [r1.result, r2.result];
...
}; |
As an aside, in case it's not clear: you won't be able to directly map cursors to async iterators for the reasons explored in https://github.com/inexorabletash/indexeddb-promises - the transaction would be kept alive during iteration until a non-IDB task slips into the microtask processing. So if you want async iteration you'll want to snapshot (e.g. via |
Ah, yeah, in which case there isn't really any point in using async iterators. |
Agreed. I guess that sadly closes this in the negative, at least until IDB gets the necessary primitives. |
Per #6 (comment) we can just wait for the second request.
If this is still on the table, I beg to differ -- I think we should do async iteration, because it gives us more room to evolve the implementation. Specifically, async interation gives us the option to protect against excessive memory consumption and OOMs if the database grows a lot. Today, we can use getAll with a count and buffer some results (say, 1k - 10k), to offset the cost of having a transaction per read. If IndexedDB evolves better primitives, we can take advantage of them in future revisions. |
Thanks for chiming in. I see the point and, with a more future-facing look, tend to agree. Let's reopen as a first step. What should we do about naming? In all cases we should add
|
Why expose any promise-for-array variations? In w3c/ServiceWorker#1066 we saw a case where the naive "return everything in a single array" design for cache.keys() would break things once a lot of stuff was stored in a Cache. Exposing promise-for-array for symmetry with existing Map metaphors when we know it becomes a foot-gun as storage usage grows seems counterproductive. Along these lines, async iterators give browsers the ability to help consumers of the API avoid tying up the main thread by breaking up the results into batches. API consumers will get what they want and what browsers want as part of the API. And the IndexedDB escape-hatch allows those who really want everything all at once to fall back to IndexedDB and getAll/getAllKeys. |
If we think other async maps on the platform have made a mistake, then departing from them and having keys()/values()/entries() return an async iterator makes sense to me. And yeah, I guess the more conservative thing is to omit the promise-for-array variants. I'll work on that spec change now. |
I've put up a rough draft, so far only for keys(), at https://github.com/domenic/async-local-storage/compare/async-iterators?diff=split . I'd welcome comments on the specification style which is fairly unusual. (Probably some of it will migrate to Web IDL.) On the side of substantial issues, one thing that came up is the tension between flexibility and predictability. In particular, what happens if you modify the database during iteration? Concretely, what should this do? storage.set(1, "1!");
storage.set(2, "2!");
let i = 3;
for await (const k of storage.keys()) {
console.log(k);
storage.set(i, `${i}!`);
++i;
} If you use the getAllKeys() strategy, this will log 1, 2, then stop. If you use a batching strategy with batch size 1, this will loop forever. If you use a batching strategy with batch size 2, it depends on whether you do your check-for-more-batches after grabbing the batch (loop will exit) or before a new batch is requested (loop will run forever). If you use a batching strategy with size 100, it will log 1, 2, then stop. This seems pretty bad for predictability. Are we OK with saying that the results are undetermined between browsers in this way? Or should we lock this down more? For example, we could mandate a set batch size. Or, we could mandate using getAllKeys() for now and hope that in the future if IndexedDB gets nice primitives then all browsers can switch at once, while keeping the same API shape. Or, we could completely get rid of keys()/values()/entries() from v1. Or, maybe there's something cleverer that I haven't thought of. |
I hope the IndexedDB transaction model already covers this. keys() gets a "readonly" transaction that it uses for all retrievals. The set() call creates a "readwrite" transaction whose mutations will never be seen by the keys() "readonly" transaction per https://w3c.github.io/IndexedDB/#transaction-lifetime-concept. So the bits about "and perform a new transaction each time to retrieve the next key" would need to be removed. I see a few wrinkles that need to be addressed with this, though:
|
Ah, so you'd use a single transaction, just potentially multiple key-retrieval calls? And for now that would mean you'd have to do all your key retrieval calls within one event loop turn, since IndexedDB transactions always auto-close right now. (And thus, probably you'd just do a single key retrieval call.) But later, if IndexedDB gets better primitives for non-auto-closing, async local storage could layer on top of that, and the only difference would be the timing of the results---not their contents. Am I understanding correctly? |
On vacation, so keeping this short: an alternative is iterating in user-space across transactions. Store the last key returned, and use an open-lower-bound range to get the next key/value, e.g. const range = IDBKeyRange.lower(last_key, /*open=*/true);
const rk = tx.getKey(range);
const rv = tx.get(range);
rv.onsuccess = e => {
last_key = rk.result;
if (last_key === undefined) { end_of_data(); }
else { async_yieldify(rk.result, rv.result); }
}; The semantics roughly match what happens to a cursor that's iterating within a read/write transaction where the data changes in IDB today. |
@domenic Not quite as bad as one event loop turn. In order to keep the transaction active you need to dispatch another request in the same event loop turn that the last currently outstanding request succeeds/errors. So you would fetch batch N+1 once batch N comes back. This just means the the polyfill batch filling would be driven by how fast IDB can deliver records, not how fast the keys() caller wants to consume them. Re: open-lower-bound range, that seems quite reasonable to me too. Since the values in ConvenienceStore are structured cloned rather than strings, if a consumer wants to atomically store two related values, they can just stick them in an object |
@inexorabletash's idea seems good to me. I'll work on prototyping that in Blink, and writing a bunch of tests for it, then I'll come back here with a spec patch. |
Update: as far as I can tell, the only way to get the first key in IDB is to use cursors; |
How about |
|
Yeah, that's what I ended up doing; I had to have separate getFirstKey/getFirstKeyValuePair and getKey/getKeyValuePair functions, which was a bit icky, but it all got abstracted over.
That might be even better, thanks! I'll try it, once I get the tests passing against my branching version. |
Would pushing additional primitives into IndexedDB to make things easier here be reasonable? I don't believe it's a goal for this API to be polyfillable on any existing engines, is it? re: Adding something like Also, adding |
For sure, although there's a question of how far do we go: e.g. we could go as far as to add fully async-iterator friendly extensible transactions, or better cursors, or even async iterator cursors. My initial preference is to do things based on top of what's specced at this time, and co-evolve to simplify the layered ALS spec on top of any future IDB additions.
Correct, just layered on top of the "theoretical web platform" provided by the stable-ish portion of the specification ecosystem.
Yeah, that's still the plan; need to move this to WICG, then when it's ready to graduate, merge into IDB.
These all seem quite nice to me, but I'm not sure I'm the right person to drive the spec/test/implementation updates; as tempting as it is, I think just using what we have right now is the right path for my time investment. But, I'm happy to be convinced otherwise. Thoughts? |
Following the spec discussion at WICG/kv-storage#6. This CL is used for prototyping and tests. Change-Id: If5262a32d42eb1c7abaad4fb2a029ce44e94834e
I can't tell from this discussion whether the impl strategy for async iterators would allow you to easily iterate over the "original" version of the db, and mutate it in the loop without seeing the changes. I think it's important for usability to make that case easy, even if it's not the easiest thing to do; having to drop down to IDB and understand the specifics of the transaction semantics (per @inexorabletash's example in #6 (comment)) is... less than good. (Or, equivalent in meaning but very different implementation-wise, do a read-only loop, building up all desired mutations in a side-list, then do a write-only loop, writing all the mutations to the db. Also v awkward.) For eager iterators this is easy; I might just be asking for |
There's also... navigator.locks.request('kvstorage', async lock => {
for await (const [k,v] of kvstorage.entries()) {
...
}
}); |
Ah, that as a slightly-easier way to iterate the db so I can synthesize it into a |
I agree there's generally a missing language-level primitive for converting an async iterator into a promise for an array. I'll start noodling on that. I did suggest maybe having something built-in for convenience in #6 (comment), but reactions seemed negative to making this "antipattern" too easy. In the meantime, it is a pretty simple wrapper to write: async function collectAsyncIterator(ai) {
const result = [];
for await (const x of ai) {
result.push(x);
}
return result;
} I don't really understand @inexorabletash's suggestion (what work is the navigation.locks lock doing?) or @tabatkins's response :) |
@inexorabletash's suggestion was a simpler way to iterate the indexeddb directly; presumably with the assumption of a "kvstorage" lock that the kvStorage library also uses, so you can be sure you've consumed the whole thing without intervening edits. But yeah, I guess the converter between async-iterable and |
Apologies for being vague. I was attempting to address:
... and somewhat mis-parsed as wanting to be able to iterate without seeing changes coming from another context. On re-reading, I don't think that's what @tabatkins was getting at. Attempting to restate his use case using different terminology: iterate over a snapshot, while making changes during iteration that aren't reflected in the iteration. My suggestion just uses locks to introduce a transaction so that the iteration isn't polluted by changes from another context (assuming all contexts wrap access by grabbing a lock), which is a solution to a different problem. In general, the "operate on a snapshot while allowing mutations" is something databases provide (leveldb, sqlite, etc), although is expensive and not guaranteed, so e.g. IndexedDB doesn't guarantee it. In IDB, if you have a read-only transaction, you don't get to mutate. If you have a read-write transaction, you observe your own writes even prior to the transaction commit. |
At the risk of derailing this thread (is there a better place to discuss this?): asynchronously reading from a snapshot would be very nice to have. Might not be that expensive in Chrome at least, seeing as it uses LevelDB under the hood (and LevelDB snapshots are lightweight).
@inexorabletash Relying on Web Locks would only work for small data sets, right? E.g. you wouldn't want to lock everything while slowly reading a million rows.
@inexorabletash Can you expand on that? I was under the assumption that IndexedDB does offer "snapshot guarantees" on a read-only transaction. And ( |
(Apologies for the further derail. Happy to take this up in an issue on the IDB repo if more discussion is needed - https://github.com/w3c/IndexedDB)
Probably not a great application architecture, no. (aside: Web Locks do support shared/exclusive)
The spec details when transactions can be started; the rule is that a RO transactions should see consistent data, and it clarifies that this can be done by having RO transactions block R/W transactions, or RO transactions can snapshot so that subsequent R/W transactions can start. Currently, only Chrome snapshots -- in other browsers, R/W transactions are blocked until RO transactions finish -- but we're changing that as part of some architectural rework of the implementation. Restating: the IDB spec guarantees that no changes are observed within a RO transaction, but allows that to be provided with or without snapshots. |
Also...
IMHO, this is better tackled with even lower level primitives than kv-storage, e.g. allowing LevelDB in the browser via a WASM-friendly storage layer, so you literally run LevelDB and can tune it for your app's needs. |
Update: I have an implementation I'm happy with, and a boatload of tests, over at https://chromium-review.googlesource.com/c/chromium/src/+/1396453 . It has pretty nice behavior for if you try to set values while iterating and such. If anyone has more test suggestions let me know (here or there). I'll get back to working on a spec patch tomorrow, using that as the basis. |
First draft up at #47. I realized I have no tests for the error cases yet so writing those, and potentially fixing the spec if it handles it badly, is the next step. |
Right now they return promises for arrays. This can be convenient; once you
await
them, they're just normal arrays.However, an async iterator would map better to the underlying IndexedDB technology (cursors which advance one by one), and would work a lot better for cases with many entries.
Should we make these return async iterators?
An alternate design would be to have them return promises-for-arrays, but also have keysIter/valuesIter/entriesIter as async-iteration versions.
The text was updated successfully, but these errors were encountered: