-
Notifications
You must be signed in to change notification settings - Fork 30
IPLD Selector Thoughts #272
Comments
I've actually been approaching this from the other end (broad to narrow). From the most abstract standpoint, we'd want a selector to be a recursive function defined as follows: struct Selector {
Node Cid
Func SelectorFunc
}
struct SearchResult {
Found Node
NotFound Selector
}
type SelectorFunc func(node Node) []Selector
func ApplySelector(s Selector) <-chan SearchResult {
// TODO: Don't be stupid (this is not real code)
// TODO: Dedup with, e.g., a bloom filter.
output := make(chan SearchResult)
apply := func(s Selector) {
n, err := dag.Get(s.Node)
if err != nil {
output <- SearchResult { NotFound: s }
return
}
for n := range s.Func(n) {
go apply(s)
}
}
go apply(s)
return output
} This ensures that:
One large problem with this version is that one can make a selector that's exponential in the number of nodes (linear in the number of unique paths). Ideally, we'd want
We could also just say that a server may choose to not traverse a node more than once (or some Given all this, we'd (at most) need a query language that can inspect a node and spit out a list of children to inspect and the selectors to run on them. |
Actually, I believe there is a better solution to the A selector would be (abstractly) type Selector struct {
Node Cid
Func SelectorFunc
Children []*SelectorFunc
} |
(Draft of a write up from today's conversation with @Stebalien - feel free to edit this directly or copy for a follow-up issue.) Limited and Generalized Selectors
Limited Selector
Generalized SelectorSelectors here could be implemented by allowing a node to be externally created by a trusted cluster computer and allowing it to run arbitrary software.
Execution and Traversal
Selector Syntax and Functionality
|
Comment on "Open problem 1": I discussed this with Juan and realized that doing this won't be quite as bad as I had thought. The client will have to follow along with what the server is doing anyways so, while the server will have to keep some state, it won't have to send it back to the client. All the server has to do is say "I don't have node X". At that point, the client will know precisely what needs to be executed at node X (it's executing the exact same selector) so it should be able to generate the appropriate sub-selector to pick up where the current server left off. |
@Stebalien, on your comment on Open Problem 1: In case I didn't explain that well, my example would be if some selector said "for any node named |
Third option: Simplified version of @whyrusleeping's initial proposal. type PathQuery struct {
Path string // includes the namespace (/ipfs)
Depth int // < 0 means recrusive, 0 means just the path nodes.
} Open questions on the simplified version:
|
@Stebalien Yeah, my proposal can't easily do that. For that, we would want to be able to filter the query by CID codec. Something like "dont send me any raw blocks". Or maybe "send me anything with children" |
@Stebalien I don't understand the "all path terminals" case: Does it mean getting a whole subtree without the leaves? And another question: if requesting a whole subtree, is it always clear which fields point to the children? E.g. in your "backbone" case, |
Sorry, I meant all root nodes addressable by a path in the chosen namespace. For example, in IPFS (the However, that isn't really a general purpose solution as we'd really like to say "download the directory tree but not the files". Unfortunately, the concept of directories is unixfs specific.
Given my simplified proposal, you can't. That's why I left that example as an open question. To handle cases like that, we'd need to be able to express path patterns like
😡 @whyrusleeping suggesting abuse of IPLD multicodecs, what has the world come to... 😭 That aside, there's no guarantee the leaf nodes will actually be raw nodes (and, with IPLD datastructures, they often won't). Furthermore, we'd really rather not download any of the internal nodes either, we just want the backbone. |
I've put some more thought into the problem of "which link is the one I want to traverse if I want to return a whole subtree". So it might look like this: type PathQuery struct {
// includes the namespace (/ipfs)
Path string
// The field to traverse if you want a whole subtree.
// If empty, return just the path nodes
Follow string
// Only used if `Follow` is not empty (default to 0)
// > 0 means maximum depth
// < 0 means depth counted from the leaf
// E.g. -1 would mean the whole subtree without the leaf nodes
MaxDepth int
} If the negative |
@Stebalien sorry, Should I be less compromising? I just want a thing |
@whyrusleeping I agree that the way to go forward is to just make a thing and iterate on it. However, I don't want to be a complete hypocrite and do something I tell every IPLD user not to do. |
Just putting this here as an example where a domain-specific project has taken the idea of XPath and built it out for its own purposes: http://hl7.org/fhirpath/ |
This is probably the best place to leave this, but i've been thinking through different usecases for ipld selectors and wrote up my thoughts:
ipld selectors
In order of complexity, here are the types of IPLD selectors we will need.
Basic Paths
Returns the object referenced by
d
(single object) at the path/a/b/c
belowH
, as well as the merkle proof toH
.Unbounded Recursion
Returns the entire subgraph referenced by
d
at the path/a/b/c
belowH
, as well as the merkle proof toH
.Bounded Recursion
Imagine a structure with the form:
Essentially a linked list. We want to be able to query through a potentially
infinite linked list. The simple form would be 'get the next four nodes' and
that could naively look like:
We could instead write this as:
But what if instead, we wanted 'All nodes from H'?
And what if I wanted 'All nodes from H until H2'?
Maybe this could look like:
Syntax
I don't care about the syntax of writing these down by hand, primarily because
i don't really need to ever do that. My usage of these will be entirely in
code. What I do care about is the data structure that will represent these
internally.
Should allow for most of what I want. Here are the above examples translated
into this form:
Multipath Selectors
I also think it might be nice to have selectors that specify multiple paths at a time, but the number of usecases for that is too small and the complexity too high that I don't really want it to block progress on the really simple and important ones (especially just the simple path one which we desperately need).
The text was updated successfully, but these errors were encountered: