-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow using sqlite's OPFS_SAH backend #39
Comments
SQLocal does not use that backend because of the drawbacks it has (summarized here). However, I think for large batches of inserts, the bigger bottleneck is serializing all of the data between the main thread and the web worker that the database runs in. SQLocal is mainly for interacting with the database from the main thread and aims to abstract away the worker, but I would be open to making a utility of some sort that makes it easier to run bulk inserts directly in the worker to avoid the serialization overhead. Do you think that would help in your use case? |
I looked at the timings using a profiler and 99% of the time is spent in the sqlite wasm. So I don't think the serialisation overhead is the issue here. Note that I do agree that the default should probably remain the regular OPFS VFS, but for applications where multiple tabs/downloading the sqlite file is not a requirement, it could be nice to have an option to use the SAH backend. |
Interesting. Can you give some more details on what your use case is? How much data are you inserting? Is it a one-time setup or a recurring process? Why is multi-tab support not needed? Mind sharing what your app is used for? |
It's a music app where a lot of metadata is loaded beforehand (to allow offline browsing). Since it also supports playback, it doesn't really make sense to have multiple tabs open. About 100MB of data is synced, one or two times per day usually. |
What kind of performance are you seeing when you do that sync? How long do the inserts take in total? I'm looking at the benchmarks for the SAH VFS versus the original OPFS VFS, and it appears that INSERT performance is virtually identical between them. The main area where they differ is actually that SELECTs are much faster on the SAH VFS, so I'm not sure switching to SAH would help in your use case. |
I use upserts to add the data, which does make it take a long time, since it includes selections under the hood. It goes from a few hundred milliseconds on the SAH backend to almost a minute on the regular one. |
Thanks, that's useful to know. I'll have to investigate the SAH VFS a bit more to see how viable it is. I did some experimentation with it, but I kept running into it throwing disk I/O errors intermittently. In the mean time, any further insights you have would be helpful, and PRs are of course welcome too. |
If I remember correctly, when using the I'm guessing that the The project I'm working on is using sqlite-wasm with the We did run into an issue on initialization of the sqlite database where the browser would report that the file was locked even though it wasn't, but in that case the solution was to retry the request. More info in this issue (sqlite/sqlite-wasm#79). This all being said, while I/O issues haven't been a problem there have been other challenges (basically entirely with Safari--god I hate Safari 😡). E.g. when using the SAH approach you need to elect a leader tab to house the sqlite instance. We use the web locks API to elect a leader tab. Occasionally we've been hit with a browser bug (basically, a Safari-only bug though technically it's happened elsewhere) where the browser fails to release a web lock when the tab is destroyed. This can cause a new leader to fail to be elected. Safari seems to have gotten better about this behavior though as I haven't run into it recently. A bigger problem is that, on iOS, Safari will aggressively suspend background tabs. When the background tab is the leader, that's a problem. Sending the leader a The reason we chose the SAH approach is because, from what I've read, the non-SAH approach basically becomes unusable because of transaction contention when there are too many tabs open. We figured, if we need to control concurrent access to the database because of transaction contention, we might as well go with the SAH approach since it's the most performant. Has transaction contention not been a problem for SQLocal @DallasHoff? |
That's a lot of good insights. Thank you, @jorroll! I just did the same test I did before with the SAH VFS, and I'm not seeing the I/O errors anymore. That could be due to a fix made between then and now to sqlite-wasm or the changes to SQLocal itself, which now has its own locking mechanism so that if a transaction is attempted while another transaction is in progress on the same database, the second transaction will wait until the first is done. I've heard of that approach to using the SAH VFS of electing a leader tab using Web Locks. My initial idea for using the SAH VFS was to have sqlite-wasm run in a
All of these issues are fixable by the respective parties, but I do not expect that any time soon, so it seems like the "leader tab" approach is the only viable one, even though it's not as robust as any of us would prefer. An idea I had to mitigate the issues with leader tabs being put to sleep or Web Locks not getting released is to have the leader tab periodically send out a "heartbeat" message on a I'd like to work on supporting more VFS's with SQLocal soon because having every SQLocal function work the same no matter which VFS you use and provide that abstraction makes it really easy for users of the library to switch between VFS's or fall back as needed. The first step of this was making in-memory databases support SQLocal's full feature set, and I'll be releasing those changes very soon. After that, I'll come back to investigating the other VFS's again, especially the SAH VFS. |
Hello! I put together a very basic PR to just allow the configuration to use SAH VFS, since I have similar needs as @jorroll, and am going to implement a leader tab/locking mechanism as in https://www.notion.com/blog/how-we-sped-up-notion-in-the-browser-with-wasm-sqlite. Currently I'm yarn patching this library to achieve this functionality but I'd love the option to just use the main branch here. |
@DallasHoff the heartbeat idea is very interesting and makes a lot of sense to me also! |
While I'm not sure, I suspect this will not work @DallasHoff. If the leader still exists but is unresponsive, it might (probably does) still have a lock on the database file. I'm not sure if there's currently any way to "steal" an OPFS file lock. I.e. you might find that, while you can detect that the leader is unresponsive, you still cannot elect a new leader because the old leader still has a lock on the database file. Mind, I don't know if this is the case, but it's a potential problem. It's also worth stating that, in practice, the problem of the leader being suspended is only a problem we've seen in Safari and (to my memory) only a problem on iOS. Our solution to this problem is to only use a persisted sqlite database on mobile if our app is running as a progressive web app (PWA). If our app is running as a PWA, then we know there is only ever a single tab and we know that the current tab is the leader tab. It allows us to avoid electing a leader and avoid using a shared worker. Because SharedWorkers aren't supported on Android, we also apply the same restriction to Android (we only enable persistence in a PWA on Android). In practice, only enabling persistence in the PWA on mobile has proven to be a reasonable restriction for our users. When someone logs into our app on mobile, we warn them that persistence isn't supported unless they install the app and we invite them to install the PWA version on their device (and we provide instructions for doing so via https://github.com/khmyznikov/pwa-install).
Browsers (well, Chrome) are currently exploring APIs for allowing concurrent access to a SQLite database from multiple tabs. I.e. you could acquire a sync access handle to the sqlite database file without locking the file for other threads. I think there's a blog post on web.dev that explores this option as well as an open issue in one of the standards repos somewhere. This would allow each tab to create it's own dedicated worker that connects to sqlite using the
Worth noting that, for web apps, in practice if you're using persisted sqlite you also need to have an in-memory database. This is because you're (probably) going to want to apply optimistic updates and have them rendered in the app in less than 16ms for achieving 60fps. For a browser based web application I think this effectively requires maintaining an in-memory database in addition to a persisted database. E.g. hypothetically you could use a wasm build of sqlite (running in a worker) as the source of truth for you app and that data would be persisted. But while sqlite itself might be able to resolve queries fast enough to render a frame in less than 16ms, sending the query to the worker via postMessage and getting a response on the main thread via postMessage can't be guaranteed within any time frame. In practice I've found that postMessage's across threads in the browser can take surprisingly long to resolve (e.g. 100s of ms or longer). My perception is that the slowdown isn't due to sqlite, but is instead just how long the browser can take to postMessage across threads (if you're sending 100s or 1000s of postMessages within a second, the browser queues them and resolves them one by one and the last one can be resolved 100s of ms after the first). The solution that our app uses is to have an in-memory sqlite database on the main thread and then a persisted sqlite database hosted by a leader tab. Both these databases have the same schema. The in-memory db is the "source of truth" for rendering the app on the main thread. When resolving a query we typically synchronously serve the results from the in-memory db but also send the query to the persisted db async. When the persisted query resolves we load the results into the tab's in-memory db and rerender data as appropriate. If we send a query to the server, we load the results both into the tab's in-memory db and also send those results to the persisted db. While we don't need to use in-memory sqlite as the synchronous datastore on the main thread, it's an attractive option since we're already using persisted sqlite and we can reuse the query logic in both places. |
Hello folks 👋 and happy new year 🥳 As coincident has been mentioned, but it's also way behind its latest state in here, I'd like to chime in with some latest discovery around the SharedWorker approach ... probably stating the obvious, but it wasn't obvious to me until I've tried all the possible things:
For these reasons I've created a new project called accordant, which is currently able to expose any import { SharedWorker, broadcast } from 'accordant/main';
const sw = new SharedWorker('./shared-worker.js', {
// invoked per each broadcast from the SharedWorker
[broadcast](...args) {
console.log('SharedWorker', ...args);
}
});
// and now you could:
await sw.sql`SELECT * FROM table`; On the SharedWorker side, you would do this instead: import { broadcast, exports } from 'accordant/shared-worker';
exports({
sql: (template, ...values) => doYourSQLTagThing(template).exec(values),
sql_notify: async (template, ...values) => {
await doYourSQLTagThing(template).exec(values);
// do anything else or broadcast changes/updates/query result
broadcast('database updated', ...['any', 'value']);
},
}); Add template-strings-array to the mix, so that you'd export the following instead: import asTag from 'template-strings-array/tag';
exports({
sql: asTag((template, ...values) => doYourSQLTagThing(template).exec(values)),
}); and in case the uniqueness of the template matters we should be 100% covered in terms of functionalities. As summary:
Hopefully nothing I've said was too boring, I just wanted to update my current state of affairs around all these topics because we also would love to be able to have SQLite on private FS (instead of blobs stored as IndexedDB) so if there's anything I can help with, anything I've said wrong, or any hint around these topics, I am all ears! Thanks for your patience in reading up to this point and I hope we'll manage to fix it in a way or another in the best possible DX way we can think about 🤩 |
I have nothing to add to the conversation here other than to suggest everyone show some support/interest in getting sharedworker support in chromium for android. They've already said it's a matter of when, not if. So if people can make it clear that this is needed, maybe it'll happen sooner than later! |
Correct, it's not. Here's an open issue about making Worker's and SharedWorker's accessible within the context of a ServiceWorker which is the closest I'm aware of to this being addressed. Curious where MDN claims otherwise?
In Comms we use the Web Lock's API to accomplish this. It requires an initialization dance, but basically both sides of a
Note that the OPFS (which SQLite requires for meaningful persistence) only makes the sync access handle API available inside the DedicatedWorker scope, so it can't be used inside a SharedWorker. So either you do what this library does and create a dedicated worker for each tab that each take turns writing to the database, or you elect a leader tab and route messages from every tab to it (this is what we do in Comms). The first option doesn't require (or seem to benefit from) a SharedWorker. In the second approach the SharedWorker is typically used in acquiring a handle to the leader tab (ServiceWorkers can also be used), but the main thread is still communicating directly with the DedicatedWorker that sqlite is running in. All of this is to say that it's not clear that the goals you outline (make sqlite accessible via a SharedWorker) are particularly useful given the current limitations surrounding sqlite @WebReflection. Though maybe I'm misunderstanding what you're purposing. |
First paragraph: https://developer.mozilla.org/en-US/docs/Web/API/SharedWorker
Interesting, never tried that dance ... I use random IDs (via crypto) to create a communication channel that would avoid broadcasting the module logic events in case other libraries would like to add listeners that don't belong to them (and vice-versa) ... using the same via MessageChannel might be fairly trivial and if Web Lock works properly I guess it'd be a win-win ... not sure any of this is needed in coincident but it feels surely a good thing to have for both ServiceWorker and SharedWorker use cases, thanks.
I didn't know that ... it also makes somehow sense because a SharedWorker cannot, and should not, be blocked, just like a main thread ... reason SharedArrayBuffer and/or Atomics (sync wait) won't work in there neither ... however, I'd like to understand if In short, what I read here today feels like a dead-end for the OPFS + SQLite shared idea unless:
|
Well if you want persistence I expect you'll turn to the easiest means of getting it. If your database can fit in localstorage, there's an official sqlite-wasm vfs for that. And as you note, if the database can fit in memory, there are also projects out there which will persist it to indexeddb and just load the whole thing into memory on start. Not sure how the indexeddb approach compares to an OPFS approach complexity wise. If you had multiple tabs/contexts persisting to indexeddb but running in memory, then they wouldn't learn of updates made in other contexts without extra work (e.g. if another tab changes indexeddb, you'd need to bring those changes into your tab's in-memory instance of sqlite somehow).
Ya, that's very misleading. I wonder if it's purely a mistake or if "workers" in this context was intended to mean other SharedWorkers (which does work, I believe). Either way, very misleading.
There's there's a proposal somewhere to add some kind of
Tangentially related, but I'm not aware of a great library for facilitating complex communicating between DedicatedWorkers, SharedWorkers, and the main thread. Libraries like Comlink are great for promise based operations, but they don't work well for requests that receive an indefinite stream of responses. Additionally there's still a lot of manual error handling you need to add. For example if using Comlink, what if you send a message to the leader tab but then that tab is destroyed before responding? Comlink won't surface issues like this (I don't think). On paper, closest I've found is this one https://github.com/daniel-nagy/transporter, but (a) it's beta and (b) it's got a heavy API which gives me pause (e.g. the maintainer has created their own observable implementation rather than just use rxjs). Would be nice if there were more options to choose from in this vein. |
I was really hoping that @WebReflection 's Coincident (talked about above) could be a high performance replacement for Comlink, but it sounds like it won't be getting support for sharedworkers. |
Deadly easy (and I have various libraries using IDB already, including SQLite related ones) ... as long as you remember (you or the user) to eventually save stuff (as blob or buffer) at some point, it's one operation to read, one operation to save back ... it already works well both synchronously (via workers + atomics) or asynchronously (shared worker only) but if your laptop dies or your browser crashes the stuff is lost (likely via OPFS too, or better, ignored on next SQLite bootstrap if stuff didn't fully commit the transaction). RAM is a concern though so it clearly doesn't scale.
accordant handles it all: you offer an API to query, change, update, things and in there you You can also skip the whole broadcast by awaiting in your current port and then broadcast updates or results to all others ... it's pretty cool and just born as SharedWorker library, it can cover everything except blocking/sync APIs calls for obvious limitations imposed by the SharedWorker context. Complink is something we ditched at PyScript because incapable of providing what coincident provides: a 100% synchronous API that can take over the whole main thread (any of them) writing code in a sync/natural way like you would do on the main thread itself. It has served well PyScript for ~2 years now and it's there to stay but it uses Worker ability to drive via Atomic.wait pretty much anything, including UI based libraries and all from a worker. The SharedWorker would've been a dream but it's clear that's not the right place because each tab could temporarily block the shared worker and that would lead to disasters while having a stuck dedicated worker has never been an issue to date, with plethora of live demos and use cases that the community has presented to date. Having a sync, yet non blocking, SQLite that works would be the next dream but while it's already possible via dedicated workers, it's an impossible mess to deal with via shared workers so I think we can live with that, as long as we can propagate updates and broadcast to other ports any data, which is what accordant does with zero needs for Atomics and SaredArrayBuffer (hence easier on host headers too). This latter part was more for @nickchomey than this discussion but at least now "all my cards" have been revealed to all others 👋 |
To clarify, coincident came up because SQLocal uses it for createScalarFunction because it allows SQLite to call main thread functions synchronously (thanks for your work on coincident @WebReflection ❤️) and if we developed something where SQLite ran inside a SharedWorker, SQLocal would need some equivalent mechanism that works with SharedWorkers if we wanted it to support user-defined SQL functions. That's definitely not the main blocker to supporting the SAH VFS though. I just wanted to mention it as something to be aware of in that list. The main show-stopper is sqlite-wasm's dependence on As a general update, I've got a branch up where I'm working on making SQLocal more pluggable so that it's easier to add support for alternative VFS's, custom compilations of sqlite-wasm (if someone wanted to have additional extensions for instance), or even alternative SQLite implementations. It should be a good step toward making SAH support easier. I'm testing it out by making a driver for sqlite-wasm's KVVFS right now. |
Worth noting that I think you only need something like coincident because you want to call user defined sql functions which are on the main thread. If the worker thread could import the user-defined function, then coincident shouldn't be needed for this particular functionality. Separately, calling functions on the main thread from within the sqlite dedicated worker seems like it could tank performance, simply because
Very cool :) It's frustrating how complex working with multiple threads currently is in the browser. And separately how slow |
@jorroll coincident erase that complexity … I can drive both main and node or bun from a never blocking worker and it works fine even with UI only libraries. Of course things must be done in the right place though, so that the worker would be a data provider and should not hook much into main thread logic, but it’s not as slow as you mentioned: PyScript has used coincident for more than a year and things work just fine even with the WASM interpreter added extra indirection. That said, I agree in a SharedWorker defining functions from main makes no sense and so it shouldn’t make much sense in workers too but if you think DX, on the main you create the worker that provides SQLite and you’re done, controlling everything it does from the main thread … that’s suoer convenient, imho, even if not ideal, but with shared workers having the same possibility feels wrong, harakiri prone, or clash collision prone so I think, because the SharedWorker can’t be a foreign script but a fine on the same domain, it’s easier to define functions once in there. |
Addendum: in Chromium. I haven't tested in Firefox, but it certainly works in Safari. This is a bug, and if you want it to be fixed, please help ring the bell around this ticket: |
If that gets fixed, that would open a lot of possibilities for libraries that depend on sabayon, right @WebReflection? |
@nickchomey the question remains: what would be a concrete use case for that? a shared worker has many tabs/windows' ports attached ... what would a roundtrip from a SharedWorker SAB look like?
I've recently (literally today) published a channel primitive so that I could rewrite accordant to use that and simplify most of the async/shenanigans based logic behind the unique channel and yet I haven't ported that to sabayon or coincident yet, still I am struggling to see a use case to tackle when SharedWorker can't be blocked and having SAB would be useful ... can anyone in here please enlighten me around this topic? Why is SAB on SharedWorker a welcome feature, beside the browsers' inconsistency? 🤔 |
I can tell you that OpenFin encounters problems related to this with notable frequency. In an environment like that, the user may have multiple browser contexts active and consuming the same data, but due to this Chromium limitation, the data must be cloned in memory for each context. If you’re working with big data, that adds up quickly. |
@brianblakely thanks! but my question, or maybe ignorance, remains ... a SharedArrayBuffer is a reference that passes forward and receives back its integrity (data speaking) ... accordingly, are we saying that a single SharedArrayBuffer can be passed along multiple threads and also be manipulated "simultaneously" by multiple threads before being notified? And how the notification itself, as orchestration, works? I am a bit confused in here because I've thought the whole point of SAB was to be a unique, single, shareable, and transparent, transferable piece of memory, to only one other end point, that could be observed from a single owner, that's it! |
Hi @WebReflection, Yes, one SAB can be worked against by multiple threads. It’s up to the dev to manage this using Atomics. Simpler use cases can be handled with postMessage. I believe your misunderstanding lies in the last bit.
In fact SAB is designed to be shared and manipulated by multiple threads. |
From the docs:
Here's some links I had filed away that (at some point) I found useful for SAB:
|
I stand corrected then (TIL) … I still need to understand what could be a use case, or better, I see the possibility to circumvent lock API though a shared buffer that simply notifies when busy or not but if from a shared worker it cannot wait sync I struggle to see benefits but time will tell. |
sorry folks, but I need to ask ... the SAB (not necessary for this task?) is one topic, but sync SQLite is the desired goal, am I correct? If that's the case, what I read here:
All these questions makes me think it's not about SAB, rather about OPFS to be not usable in a SharedWorker, and I see convoluted complications all over the spectrum ... so please, can you correct, fix, or change any point in the logic in here? Thank you again 🙏 edit SAB as SharedArrayBuffer |
The OPFS API currently only supports getting a sync access handle to a file from within a dedicated worker. At the moment, the fastest VFS for SQLite-wasm (which also has the fewest browser requirements and widest support) only supports a single dedicated worker connected to the database at a given time and opening / closing connections on demand isn’t an option (look up the SQLite wasm docs for more info). Because of this, if you want multi-tab support you need to elect a leader tab to host this dedicated worker. This leader tab is responsible for providing MessagePorts to the dedicated worker to other tabs (so that every tab has direct access to the dedicated worker. Browser interest indicates that the next likely improvement here will be to allow multiple workers to get sync access handles to the same file, so in the future this same approach could be used where every tab has their own dedicated worker.
the only thing a shared worker is used for is getting a MessagePort handle to the dedicated worker hosted by the leader tab. I.e. it’s only used when the leader tab changes or a new tab is initialized. See the wa-SQLite repo for examples.
every tab is the same distance from the dedicated worker. But post message is super super slow in the browser so, unfortunately (and I can’t believe this is true but we see it in production), round tripping with the server can be faster than accessing the local SQLite database. This can also happen on cheap devices because their SSDs are slow. See a Notion blog post on their adopting SQLite-wasm for Notion. For our case, we see post message slow down and cause issues with high message volume (1000s of msgs) / high payload size. Serializing data between threads is a synchronous, blocking operation.
correct, this is a problem that needs to be addressed. Our solution is to use a web lock to monitor when the leader tab is destroyed. If that happens while a request is in flight, we retry the request after a new leader is elected. I’m responding on mobile so forgive the brevity / lack of links. |
Using transferable objects (ArrayBuffer) or shared memory (SharedArrayBuffer) to pass SQLite data between tabs would be a HUGE HUGE performance unlock since it’s an order of magnitude faster than serializing data via post message. Assuming, of course, you’re not just adding an expensive encoding / decoding step. |
If I read that correctly, we’re saying that postMessage (and its structuredClone operation) is slower than putting a JSON into a buffer and transfer it, right? ‘cause that’s what we do in coincident: the worker asks main to do something, passing along a SAB of length 8 (2 * int32 bytes), the length of the resulting operation is stored at index 1, while index 0 is used to notify the waiting worker passes along a new SAB with length padded for Int32 (until growable SABs are a thing) then main serialize the result, notify again, and the worker can stop waiting after deserializing … we have great performance but I’ve always thought the serializing/deserializing was the bottleneck, apparently I was wrong? but at the end of the day, if transferring buffers already fixes the perf issue what is then SAB really needed for in a SharedWorker? Again, I’m asking because I have a branch in sabayon that polyfills SAB falling back to buffers but if buffers are already a solution I would rather not implement that overhead in the library as it brings very little and now that I’ve learned SAB can be shared all over the place I think it will be a huge mess to properly polyfill it in a way that wouldn’t “explode” RAM |
Not quite. "putting JSON [a string] into a buffer and transfer it" adds an encoding and decoding step that would (probably definitely) be slower than just passing it normally via structuredClone [1]. But in the case of sqlite, I don't expect that the data is starting out as javascript objects. It's starting out as some sort of internal SQLite data structure before being transformed into a javascript object and returned. So in this context, if sqlite transformed it to some sort of ArrayBuffer which was returned instead, possibly that process would take a similar amount of time so you could eliminate the structuredClone cost without adding a JSON encoding cost. For the receiver you would be adding a decoding step, but seems possible that you'd still come out ahead. Obviously this is just speculation on my part.
I can't speak to this. |
If im not mistaken, transferring an ArrayBuffer makes it only available wherever it was transferred to. As I pointed out in my previous comment with lots of links, SharedArrayBuffers are not transferrable. Instead, they can be read/written from many places at the same time. The links went into considerable detail about all of this, along with some code for implementation (though im sure some of it is out of date now), and even some benchmarking of using this for parallelization vs single threaded. |
To whom it might concern, I haven't officially published this yet but it's all about making a Uint8Array buffered version of any JS variable compatible with the StructuredClone algorithm: https://github.com/WebReflection/buffered-clone I am plannning to circumvent all variants of the edit it's now official on npm |
to whom it might concern ... buffered clone edit the early benchmark was entirely wrong but there is light at the end of the tunnel edit 2 personal conclusions buffered-clone is now 100% code covered and while it cannot compete with |
I'm just going to share this excellent discussion from wa-sqlite here: various mechanisms were discussed and shared for creating a Leader worker that solely interacts with sqlite. They all use BroadcastChannels (browser-based PubSub) to communicate across the various main and worker threads, and also expose comlink-like interface that can remotely execute functions without any messaging stuff. rhashimoto first shared 2 implementations - SharedWorker (cant use in android chrome) and Service Worker. There's code and demos for it. But further down, two people shared their own (I think much better) versions (shared-service) and (observable-worker) that don't use either SharedWorker or Service Worker- they just use normal Web Workers + BroadcastChannel + Web Locks. Each tab creates a worker, which tries to become the leader via Web Lock. Whichever is the leader creates a separate BroadcastChannel with each worker/tab (so that the published messages dont spam all tabs). They both seem to be well-made, though the latter is much more elaborate than the other. I'll likely use the simpler one for my needs. Importantly for me, because they all use BroadcastChannel, a Service Worker should be able to trigger it - not just main thread or other workers. (Service workers only seem to be able to talk to Workers via BroadcastChannel and Client.postmessage(). I hope this helps! |
Oh that is clever. So every tab initializes by talking with the one OPFS SAH SQLite dedicated worker via a BroadcastChannel with a static name. Then they negotiate a new BroadcastChannel which is known only to the dedicated worker and that tab so that they can communicate 1-on-1 directly--all without needing to transfer ports via a SharedWorker or ServiceWorker. This does seem like a better solution without any downsides. Thanks for sharing! |
Yes, it seems quite straightforward - I never really understood the various convoluted dances that have been proposed/used here and originally in that discussion. I suppose it's just a result of people seemingly not being aware of APIs like web locks and broadcastchannel...? Though, it's important to note that the dev of the more comprehensive of the two libraries replied there to my comment that they have abandonned it in favour of using OPFSAdaptiveVFS. I know nothing about it, but here's some info https://github.com/rhashimoto/wa-sqlite/tree/master/src/examples#vfs-comparison. Useful notes and demos in this discussion as well rhashimoto/wa-sqlite#153 I'll carry on exploring the simpler implementation |
I might as well also share this library, which does some related things. Far fancier than I need, but I'm sure some here will appreciate/be inspired by it. https://github.com/daniel-nagy/transporter And a blog post about it https://danielnagy.me/posts/Post_s2fh85ot8gqd The browser module, in particular, is probably most relevant. https://github.com/daniel-nagy/transporter/tree/main/packages%2Fbrowser |
OPFSAnyContextVFS seems to work in shard worker ... why did they preferred adaptive? |
No idea. You could go ask...? Though, again, sharedworkers are largely a non-starter so long as android chromium doesn't support them. |
it's about browsers' support, where Worker, MessageChannel or SharedWorker have more or less the same wide compatibility (except for the latest and WebView shenanigans) while Broadcast Channel requires fairly recent iOS or browsers, although these days it should be covered. The Broadcast Channel although doesn't fix or replace SharedWorker but at least for Service Worker based fallbacks (like the one used in sabayon) it's probably a way better way to do what I am doing already, which is a broadcast channel implemented via Workers and a unique identifier per worker, basically doing the whole thing myself as opposite of using that API. It was my fault not checking support around it earlier but I was aware it existed, it just felt easier to use basic APIs that surely work everywhere, like Workers are to me ... but I'll think about switching to that if there are actual benefits.
Yes, I could, I was hoping for an already discussed reason but it's OK, I'll try to read about the choice or ask explicitly. |
A universally available and fully functional Shared Worker would be the holy grail. But unfortunately android chrome inexplicably hasn't added it yet. So, I don't see how it would ever be worth anyone building on top of it. The only viable option are these various convoluted dances that have been implemented and discussed. But some of those are more convoluted than others. Surely what you've done works just fine, but I really think Broadcastchannel + leader elected web worker with web locks can greatly simplify things - as shown in the repos I linked to above. No need for sharedworker, serviceworker, ports etc Broadcast Channel is now just as ubiquitous as any other api. As usual, Safari was dragging it's feet, but added support nearly 3 years ago. Whereas Messagechannel 98% https://caniuse.com/channel-messaging (though its functions have 97% https://caniuse.com/mdn-api_messagechannel_port1) Broadcastchannel 96.5% https://caniuse.com/broadcastchannel Web workers 98% https://caniuse.com/webworkers Lock api 95%https://caniuse.com/mdn-api_lock Shared worker 44% https://caniuse.com/sharedworkers I have no idea, though, how performance compares. Surely it can suffer if you don't have a dedicated channel per tab/worker/context and end up publishing to many/all contexts. But that's an easy fix, while having global publishing capabilities is just a nice superset functionality. I also have no idea how it works with the fancy sharedarraybuffer, atomics etc that you have been implementing in various libraries. But it's definitely worth checking out! Ps I'll share one more repo, which has been used in a lot of production site for a while. https://github.com/pubkey/broadcast-channel It doesn't have a Comlink RPC, but does similar stuff with broadcast channel and weblocks. |
@nickchomey what's missing is the ability to combine SharedArrayBuffer to all these other primitives so that each worker could cooperate with each other without clashing while being able to synchronously block the current dedicated worker until an answer is received ... this would make a single pyodide bootstrap reachable by all opened tabs, as example, so that all of them would run over a single interpreter as opposite of bootstrapping it multiple times ... or better, coincident could work like that, although the client / server part can't work the same but those are implementation details. My last thoughts on this: it's an absurdity we all need to find other strategies to simply simulate what a SharedWorker would provide out of the box ... there is really no much of an excuse about not wanting SharedWorker when all we're trying to do is to emulate it via a leading tab orchestration ... is anyone at Google reading this space? I wish they did! |
@WebReflection Can you please remind me what is blocking such a Leader-Elected Web Worker approach (which could be the only one to bootstrap pyodide) from using SharedArrayBuffers? I think you pointed out that SharedWorkers dont even have SAB support in chrome, so normal Workers are the only option anyway. Is it the COOP/COEP/CORS stuff? Here's a good article from StackBlitz on the topic - and why they only work on Chromium. https://blog.stackblitz.com/posts/cross-browser-with-coop-coep/ I really do encourage everyone to vote and petition at this Chromium issue for Shared Worker support on Android... https://issues.chromium.org/issues/40290702 I just did that again, and linked to this discussion and the one at sqlite.wasm - it looks like they automatically collect such links at the top of the issue. So, it would be worth linking to other issues and articles that show how you're affected by this |
that's a no-go because pyodide takes ~1sec up to ~Xsecs to bootstrap. The leading tab would have its dedicated worker to point at the shared pyodide state but close that tab and disasters happen:
Basically, all I am saying is that this whole orchestration around web locks and broadcast channel surely solves SQLite on the Web (damn it, we had that before already, thanks Mozilla for nuking WebSQL!) but it stops there ... anything else that is not based on a persistent FileSystem to work as expected won't benefit from any of these primitives ... SharedWorker would, instead, solve them all from libraries authors to users expectations. |
Gotcha. Yeah, we need shared workers... Til then, we can try these convoluted "polyfills"... Though, on that note, why NOT just make this a polyfill? If shared worker exists (everywhere except chromium android), use it. Otherwise do this dance. It's still a mess, but possibly more performant and feature-full in some contexts than doing the dance all of the time. There's also the reality that people aren't really using multiple tabs on mobile at the same time. I'm sure that is partially why chrome hasn't added such support. In fact, I wouldn't be surprised if they automatically go to sleep, in some sense, when not in focus. Or you could make that happen with some event listeners on visibilitychange. It wouldn't solve your bootstrapping problem across the board, but it's a partial fix. |
I wrote tons of polyfills but only where I could grant at least some compatibility ... this leading tab dance breaks entirely all use cases I work with daily, so it's not interesting to me, as too fragile, but it would also break users' expectations all of a sudden if anyone sold it as a proper polyfill. Accordingly, I wouldn't work on it because it'd be a "slippery-slope" that fails too hard on our users' expectations (and mine) but I'd be curious to hear anyone trying to provide that for other tasks that go beyond my daily ones. Anyway, I feel like I've reached an unfortunate dead-end in here, plus many others are involved and I don't want to steal their time further ... so here my summary:
Have a lovely rest of the week 👋 |
Thanks for the great library! I'm running into some insert performance degradation similar to this issue: sqlite/sqlite-wasm#61
One of the things mentioned to try there is to use the SAH backend. This backend does not allow for concurrency, but that is not really required for my use case, so it would be nice to use that backend.
The text was updated successfully, but these errors were encountered: