Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: standardize the approach used to identify messages (msgHash or msgId) #2498

Closed
Ivansete-status opened this issue Mar 4, 2024 · 5 comments
Labels
effort/weeks Estimated to be completed in a few weeks enhancement New feature or request

Comments

@Ivansete-status
Copy link
Collaborator

Problem

Nowadays we have implemented two different ways to identify a message:

  1. Message Hash (sha256( pubsubTopic + message ) ):
    https://github.com/waku-org/nwaku/blob/master/waku/waku_core/message/digest.nim#L29-L40

  2. Message ID ( sha256( message ) ):
    https://github.com/waku-org/nwaku/blob/master/waku/waku_relay/message_id.nim#L31-L33

The message_hash is extensively used in Waku side to overcome the lack of deterministic hashing in protobuf encoding.
Besides, the message_id is mostly used within GossipSub to drop an already-seen message. https://github.com/status-im/nim-libp2p/blob/349496e40fcae9f225c437662b5bbe3c6b6eb3af/libp2p/protocols/pubsub/gossipsub.nim#L474-L475

Suggested solution

This suggests two points:

  1. Use a standard way to identify a message and therefore use one single concept: msgId or msgHash.
  2. Only identify the message with the formula to compute the msg hash: sha256( pubsubTopic + message ).

Alternatives considered

  1. Stop using the concept of msgHash in Waku and only consider msgId.
  2. Stop using the concept of msgId within GossipSub and use msgHash as we are using in Waku.

For both alternatives, the proposed computation should be sha256( pubsubTopic + message ).

@jm-clius
Copy link
Contributor

jm-clius commented Mar 4, 2024

Stop using the concept of msgHash in Waku and only consider msgId.

This may not be feasible as msgId depends on encoding and decoding of protobufs, which is not globally deterministic. For schemes such as Waku Sync, we require a globally unique ID mechanism.

Some related issues and docs:
https://github.com/waku-org/pm/files/10567778/Waku_Message_UID_.v2.pdf
waku-org/pm#9
vacp2p/research#175

@Ivansete-status
Copy link
Collaborator Author

Thanks @jm-clius!
I'm keener on only using the msgHash concept within GossipSub. In that case, to avoid multiple decoding of the same message, we would need to perform the validation (which is actually decoding the message to validate) earlier and then extract the msgHash from there, and use that extracted msgHash for message deduplication in GossipSub.

@SionoiS
Copy link
Contributor

SionoiS commented Mar 4, 2024

We could use IPLD. That way we get globally unique CID for each message and various codecs (cbor, json, etc..).
Sadly nothing is implemented in Nim :(

@richard-ramos
Copy link
Member

I can only talk from my experience of using go-waku, but I think we should keep the handling for message IDs and message hashes as it is implemented right now, and maybe just rename any instance of messageId to GossipSubMessageId just so it is clear what it refers to.

The reasons why i suggest not doing this change is:

  1. Like @jm-clius mentioned, protobuffer serialization is not deterministic: protobuff generation libraries in diff languages might produce different results when serializing objects or like with nim's minprotobuff, they might not be entirely proto3 compliant with regards to how optional/default values are handled.

  2. The gossipsub message id is a concept that is used in the Gossipsub layer, and it should probably not be used in the upper layers (i.e. Waku and Application logic), besides an initial setup for setting the message ID function. This seems to be the case right now, as doing a quick search in the codebase reveals that these are the only places on which the message ID is being used image
    (there are other mentions of messageId within RLN code but seems to be an entirely different concept)

@chair28980 chair28980 added the enhancement New feature or request label Mar 6, 2024
@chair28980 chair28980 moved this to To Do in Waku Mar 6, 2024
@gabrielmer gabrielmer added the effort/weeks Estimated to be completed in a few weeks label Jun 4, 2024
@Ivansete-status
Copy link
Collaborator Author

Closing this as not needed for now

@github-project-automation github-project-automation bot moved this from To Do to Done in Waku Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/weeks Estimated to be completed in a few weeks enhancement New feature or request
Projects
Archived in project
Development

No branches or pull requests

6 participants