-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(performance): support for Map as dataset #93
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work on the investigation and resulting fix! ✨ Appreciate it, looks good from my end. I'm from the Ecosystem team though and I'm happy to merge and release this, and new versions of @sanity/groq-store
, @sanity/preview-kit
and next-sanity
, as soon as we get an approval from someone at team Content Lake 🙇
Thanks so much for the PR! Intuitively, the idea that a I think being explicit would be better here. We could introduce something like a interface Dataset {
allDocuments(): [Object]
getDocument(id: string): Object
} Then we can have concrete implementations that wrap arrays, |
Hey @atombender, I agree with your point. I'm actually using groq-store and trying to make it a bit faster on my end. Using Shall I try to spend some time on changing this implementation to accommodate that, or would you prefer to take this over? |
I think it would be great if you could amend this PR! |
@judofyr Any thoughts about what this should look like? |
We've been talking about making the whole dereference operator configurable: evaluate(tree, {
dataset,
dereference(obj) { // todo: bikeshed the name
// this is the default behavior:
for (const doc of dataset) {
if (obj._ref === doc._id) return doc
}
return null
}) In And for the case described here you could just do Another extension would be to support |
I had a similar idea (the "B" option I mentioned in the PR description), my reason of not doing that was:
evaluate(tree, { dataset: datasetAsArray, dereference: datasetAsMap.get }) This makes sense as well for me, I'm happy to amend this PR in any direction you think would be the best :) |
Yes, I'm still leaning towards this solution. With the customized For consistency the signature should probably be |
If I get it right the deref function would be called when an object is a reference ( The options I can think of are:
|
Two thoughts:
I think I'm leaning towards having the function have |
Oh yeah, that would be weird :) The spec doesn't allow that either, although I could imagine Great, thanks for the discussion @judofyr! I think I will open this as a separate PR as I think there is nothing I could reuse from this branch. |
Well, once we allow the dereference operator to be overridden in any way then it's possible to make it not compliant with the spec. I think it's fine for
Nice!! |
Sorry for dropping this! Is this still relevant for you? If so, feel free to rebase against latest |
As the High performance GROQ mentions, resolving references can be really slow when using a large dataset.
I ran into an issue where querying a list of documents can result in ~3200ms execution time. This query just queries documents and resolves one reference in each document using the
->
operator, like:I made some debugging and I noticed that the majority of the time is spent in the
Deref
function when resolving references:As the current implementation iterates over the entire dataset when resolving references, this will be exponentially slower when increasing the number of documents in the dataset.
I was thinking of simply indexing the documents using their
_id
s with a simple Map (Map<ID, Document>
), so resolving references can be way faster than iterating over the whole dataset.I'm not 100% sure what is the best way to add this feature, I was thinking of:
A) supporting a
Map<ID, Document>
as the value of thedataset
parameterB) adding an extra option like
indexById: Boolean
toevaluate
which would transform the currentdataset: Array<Document>
todataset: Map<ID, Document>
before actually doing the evaluationI ended up doing A) in this PR, and tested that slow query which I mentioned at the beginning with these changes, which got ~25x faster - from ~3200ms to ~125ms.
Let me know what you think!