-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make FindNode according to spec #205
Comments
After further testing the FindNode call I noticed that indeed sometimes nodes would be returned that were not the closest. A whole bucket could be missing even from the returned values. Now, why do we need this The Kademlia paper describes an implementation that starts off from one k-bucket, and keep splitting the bucket as more nodes are discovered and added. The bucket splits only on the part of the binary tree that our own id belongs too (same prefix). Resulting eventually in k-buckets per log base 2 distance. Well, not really, as nodes with ids in the lower distance ranges will never be found. And because of this an optimisation is done where buckets will also split even if the nodes own id does not have the same prefix (to avoid highly unbalanced branches). Now some implementations take a more simplified approach. They simply directly create buckets for each possible logarithmic distance (e.g. here 1->256). Some implementations also just don't create buckets with logarithmic distance < x, as the lower the distances goes, the less chance there is that you will find nodes for it. Now, discoveryv4 its FindNode call would request k closest nodes. As does original Kademlia. Now, in Discoveryv5 the approach was taken to pass the logarithmic distance instead of the NodeId as parameter for the FindNode call. And to only return nodes that match that logarithmic distance. This was done to not put the trust at the node for selecting the closest nodes. Possibly a difference in implementation, but probably more important a security issue. See also: https://github.com/ethereum/devp2p/blob/master/discv5/discv5-rationale.md#115-guard-against-kademlia-implementation-flaws The result is, that in an implementation which just stores buckets per logarithmic distance, it simply has to return the bucket. In our split-bucket implementation, this can't be done and we are still doing the closest search, which is also why we need the This FindNode change will also have an effect on the efficiency of the network. Work will be moved from the receiver of FindNodes to the requester. But this also means more network traffic, as less nodes will potentially be passed around, and thus more requests will be needed for a lookup (adding bandwidth and latency). This might be a concern for mobile devices. |
this fine description belongs in the source code ;) |
According to spec now after #223 + tests added. |
The current FindNode call will respond with ENRs closest to distance. According to spec it should only be ENRs with that distance. This is because we reuse the discovery v4 code where it should indeed respond with all closest nodes (<= k)
Next to that, there are some weird findings in vacp2p/research#19 . It should be verified if this is due to bugs in the
findNode
implementation.The text was updated successfully, but these errors were encountered: