Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIP-17 - Event Metadata #605

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

arthurfranca
Copy link
Contributor

NIP text here

@fiatjaf
Copy link
Member

fiatjaf commented Jun 14, 2023

I like this and I've suggested this approach to people many times, but every time I did it wasn't well-received, so I don't know.

@arthurfranca
Copy link
Contributor Author

For a fast user experience it is better showing counters faster with metadata from one of the relays instead of counting events from many relays for each counter.

In fact, a client can choose to start showing the estimates while it gathers the right numbers on the background.

@pablof7z
Copy link
Member

NACK; this would be a hard-fork.

If there's interest in this idea relays should send a new/separate command along with the EVENT (SUMMARY?) with this data which non-supporting clients would simply ignore, but event hashing would not break.

I would probably still Concept NACK this idea, but at least this way it would not be hard-forking. I think this idea could be very centralizing, but I would want to hear @cameri and others' opinions.

@fiatjaf
Copy link
Member

fiatjaf commented Jun 15, 2023

Why a hard-fork? These fields are not covered by the hashing or signature, they will be ignored by any parser that doesn't know about them.

on the other hand we already have COUNT and I agree it is kinda centralizing.

@pablof7z
Copy link
Member

Isn't the idea that this would be embedded by the relay inside the event object? So to verify the signature, the client would need to strip this .nip17 part to hash the event and verify the signature? Or is the idea that this .nip17 would live somewhere else?

@staab
Copy link
Member

staab commented Jun 15, 2023

@pablof7z serialization for signatures is defined in NIP 01 using a list of event keys, ignoring any additional key attached by whomever, this won't cause any problems with that.

@pablof7z
Copy link
Member

I must be misunderstanding something, if the client generates the event, serializes (obviously without that nip17 path), but then the relay injects that nip17 path into the returned event then clients reading the event need to strip that thing out before checking things.

I guess you guys must have in mind that this nip17 path is not injected into the event json but is delivered somewhere else? Where?

To be clear, this 👇 would be hardfork but I guess this refers to something different?

{
    "content": "",
    "created_at": 1683463120,
    "id": "",
    "kind": 1,
    "pubkey": "",
    "sig": "",
    "tags": [],
    "nip17": {
        // stuff inserted by the relay that a client needs to strip out before checking
    }
  }

@pablof7z
Copy link
Member

pablof7z commented Jun 15, 2023

I retire the hard-fork comment

still don't think this is a good idea, it increases costs on relays, relays that adopt this will get beneficial treatment, etc etc, etc, [insert-reasons-why-BSV-is-stupid-here]

@staab
Copy link
Member

staab commented Jun 15, 2023

I don't know how relays would handle the extra field, probably depends on implementation. Coracle at least would just ignore it.

I think 1. we need something like this, and 2. this is probably not the solution. I'm still of the opinion that "extensions" should be handled by dedicated services and relays should have a mechanism for routing clients to those services based on functionality and data locality per this post (and I'm still waiting for your comment on #259 @pablof7z 😂).

Should I go ahead and write a NIP to get the ball actually rolling?

@arthurfranca
Copy link
Contributor Author

arthurfranca commented Jun 15, 2023

it increases costs on relays, relays that adopt this will get beneficial treatment,

The increased relay cost of updating the numbers is worth it (well.. each relay can decide that) if you consider some clients may stop requesting a great number of reactions, reposts and so on, which is better($) for relays.

I don't think it is a strong centralizing force cause I don't expect clients to stop connecting to non-supporting relays specially because of how NIP-65 works.

@arthurfranca
Copy link
Contributor Author

I'm still of the opinion that "extensions" should be handled by dedicated services and relays should have a mechanism for routing clients to those services based on functionality and data locality

@staab If I got it right, your routing functionality needs "extensions" that interoperate: so they need NIPs to say how things work.
"Extensions" act on nostr events and speak webscokets, so at the end of the day they are relays with optional NIPs implemented.

Your NIP-11 extensions PR seems compatible with this like: { "extensions": [["NIP-17", "wss://relay.example.com"]] }

I still see no harm with some relays adding "nip17" extra field to returned events.

@staab
Copy link
Member

staab commented Jun 15, 2023

@arthurfranca that's correct, but of course this is all the hallucinations of a deranged mind at this point. But I think it would work. So what we need is 1. optional nips (like this one) that extension relays can implement, but much more importantly 2. a routing protocol to help clients get connected with relays that implement the functionality they need. Case in point, I currently hardcode nostr.band, purplepage.es, and some other things in Coracle. It would be much better to 1. look at the user's relay selections (or event hints and whatnot), 2. ask those relays who they recommend for various nips using the relay metadata document (they could recommend themselves) 3. use those recommendations to fulfill COUNT, search, etc.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Jun 15, 2023

Counts are mostly irrelevant because, ideally, clients should not rely on a single relay to download information from. And if Clients get two counts from separate relays, it's impossible to know what was included in each count.

If relay A brings a post with 7 likes, and relay B brings it with 9, what do you do? Assume B knows all 7 in A + 2 additional likes? Assume they are all different and move the count to 16? It's a mess.

Other than that, the metadata is not verifiable. Clients will have to trust relays which is... not ideal.

@staab
Copy link
Member

staab commented Jun 15, 2023

Counts are mostly irrelevant because, ideally, clients should not rely on a single relay to download information from.

Yes, but recommended extensions can potentially fix this. I.e., you ask your 10 relays what nip 45 relay to ask for counts, and 6 of them recommend relay x, 4 relay y. So the client then asks x and y, reducing the overlap, or just relay x if it wants to be conservative.

@arthurfranca
Copy link
Contributor Author

@vitorpamplona These numbers are simple estimates. The client just knowstrusts that there is atleast x likes. It chooses to show x from relay X or y counter from relay Y, there is no trying to merge numbers.

For an user browsing a feed, just knowing a note has "some" likes and replies may be enough for him to get interested, click the note and check out the replies.

So the idea is that even estimates can be beneficial and avoid extra requests. The client decides if it wants accuracy or speed.

Clients already won't choose 100% accuracy, which would need data from all relays.

@vitorpamplona
Copy link
Collaborator

I am not against adding this. I just don't think it's that useful and it might lead to centralization in single relays.

It's like the COUNT filter. People try to use it but invariably always go back to downloading everything because people complain about mismatches in the UI. It's more visible in reactions. If Amethyst shows 4 replies but loads 5 when clicked, there will be people in my neck complaining about it.

But If somebody is actually using this (I suppose single-relay workspaces could use it), so be it.

@fiatjaf
Copy link
Member

fiatjaf commented Jun 15, 2023

Should we delete the COUNT NIP and forget it has ever existed?

@staab
Copy link
Member

staab commented Jun 15, 2023

I wrote that NIP specifically to provoke conversations like this one.

@vitorpamplona
Copy link
Collaborator

Should we delete the COUNT NIP and forget it has ever existed?

Is anyone using it?

@fiatjaf
Copy link
Member

fiatjaf commented Jun 15, 2023

I think no one is using it, not even @staab.

@staab
Copy link
Member

staab commented Jun 16, 2023

In fact I am

@staab
Copy link
Member

staab commented Jun 16, 2023

I've proposed a potential solution to this problem a few times since February, but no one has seemed interested in talking about it, and I haven't had time to really build anything for it. #259 if you want to take the conversation over there.

@leoperegrino
Copy link

leoperegrino commented Jun 16, 2023

What if the metadata generation was done client-side?

Clients could implement the aggregation of the data they receive, sign it and send back to the relay. Other clients could initially request the metadata, maybe even filter for a metadata generated by a specific trusted user.
To prevent metadata being partially counted in the generation process, clients would have to wait for EOSE to aggregate and send back.

I think this could be achieved in two ways:

  1. with a specific kind event exchange
  2. assumed to be part of the REQ message (relays compliant with this NIP would send events and metadata).

I think first option is ideal since it just adds new kinds and doesn't break with expected REQ messages responses.

To demonstrate 1

Let's suppose relays A and B, clients X and Y.

Client X asks to relay A and B for a specific thread. X waits for all the events and when EOSE is sent, if any differences are found, X can have a list of reliable relays and use the data sent by the prefered one. X can then aggregate and send back to A and B the metadata signed by itself.

Now Y asks for the same thread to A and B and can decide if it wants to:

  1. also request the metadata
  2. request a specific user generated metadata
  3. not request metadata at all and just use the information it receives

If metadata is requested, Y would update the UI accordingly solving different metadata values with a list of prefered relays/users. When all events are received it can correct what is outdated and send back to the relays A and B a new metadata event.

All in all, this reduces any burden on the relays and transfers the decision of using it or not to the clients, which is ideal. One thing to be discussed is if relays should only keep the last metadata they receive.

@staab
Copy link
Member

staab commented Jun 16, 2023

@leoperegrino very creative solution. This could be implemented with a new event kind. These would become stale very quickly, so queries for the metadata would include a restrictive since. If a metadata event isn't found, the client can then manually request all the metadata and sign an up to date meta event. This of course trades off low latency for low bandwidth, but it would be highly reliable since clients could ask for multiple metadata from reputable sources and use the median stat or something.

@leoperegrino
Copy link

leoperegrino commented Jun 16, 2023

@staab thank you for the reply.

I think the since filter and the statistical analysis are very promising.

A few comments/suggestions:

  1. The since filter could also be implemented by relays to save disk storage.
  2. The statistical analysis could also be way to filter low quality relays.
  3. Maybe overkill but clients could save locally a history of which relays/users are sending the most outdated/up-to-date metadata.
  4. The UI could show that this is an estimate in a subtle way, such as a slightly different likes icon. When EOSE is received the client updates to the correct value and changes to the standard icon.

@leoperegrino
Copy link

If a metadata event isn't found, the client can then manually request all the metadata and sign an up to date meta event

to be precise, I think you meant:

If a metadata event isn't found, the client can then manually request all the DATA and sign an up to date meta event

When, and IF, we use this idea for a NIP, the term metadata would be just the aggregation of the original data.

@AsaiToshiya
Copy link
Collaborator

Small things, "NIP-17" already used in #324.

@arthurfranca
Copy link
Contributor Author

@AsaiToshiya no problem I can update the PR if the #324 PR gets merged first.

@arthurfranca
Copy link
Contributor Author

@leoperegrino creative indeed, it would be an additional option for clients to fetch such counter events from friends. Could be a new event kind with the same fields from NIP-17.
But the counter accuracy would depend on the moment the event was viewed by an user, if near the note publishing moment, few likes, few replies.

However flooding relays with thousands of conflicting metadata counter events from thousands of users may not be a good idea. Maybe making the "expiration" tag 48hrs ahead as a requirement could help (most notes get kinda stale after that) but don't know if it would be enough to make it a good solution.

@DanConwayDev
Copy link
Contributor

Counts are mostly irrelevant because, ideally, clients should not rely on a single relay to download information from. And if Clients get two counts from separate relays, it's impossible to know what was included in each count.

What about using an array of event ids instead of a count?

It would enable clients to combine responses across subscribed relays.

a shortened version of the event id could be used to save space. This could potentially be very short, trading off the occasional inaccuracy due to collisions against space usage.

This would be more expensive for the relays to maintain, but potentially worth it for client performance?

@arthurfranca
Copy link
Contributor Author

@DanConwayDev List of id suffixes (cause prefix may have "0" PoW chars) could work. Don't know the ideal char length though. Also don't know if relays would be willing to store this extra data, let's see what others say.

Maybe limit the set length to a max like 200(?) which would still be useful to the less popular notes (majority of notes) to save disk space even more? Above that client can pick the higher regular int counter or the counter from the relay it trusts more.

What about view count by ip? I suppose not all counters would use your set approach.

@leoperegrino
Copy link

leoperegrino commented Jun 18, 2023

I think this NIP, the way it is right now, is increasing relay responsibilities. It does not comply with the principle to keep relays dumb and adds a state management to them.

However, self-signed metadata is very simple to implement in the client. I don't think it will flood the network since the application can aggregate and send metadata only when entering a particular event/thread. When in a feed, such as a global one, you are not expected to request all replies right away, so you just consume others metadata and don't produce them. As I suggested before, UIs can just indicate to the user that they have an estimate, not real data.

And for server side, the relay can just set a threshold of many of the same event will be kept. This avoids increasing relay CPU usage with aggregation, solves the million queries problem and let relays focus only on event throughput.

@leoperegrino
Copy link

leoperegrino commented Jun 18, 2023

To be fair, we could have both: relay generated and self-signed metada. Each with its NIP. It could be announced by the relay which one is supported. But I guess the protocol should focus on scaling without server requirements, which should lead to greater decentralization.

@arthurfranca
Copy link
Contributor Author

As the relay would be already storing extra metadata. I've made it searchable with NIP extensions.
Like this: ["REQ", <sub_id>, { kinds: [1], ..., nip17: { language: ["en"], country: ["US", "GB"] } }]

For now just language and country. This should help a lot with better global/featured feeds.

@arthurfranca arthurfranca mentioned this pull request Jul 8, 2023
@arthurfranca
Copy link
Contributor Author

Added some filters as optional NIP extensions to help clients find trending events. For example:
["REQ", <sub_id>, { "kinds": [1], "since": 1689545303, "limit": 20, "nip17": { "replies": ">100" } }]

If you think this is too much I could keep atleast the language one that is very helpfull.

@arthurfranca
Copy link
Contributor Author

Simplified NIP text.

Now there are just 3 fields and 3 filters that must be supported by relays.

@arthurfranca arthurfranca marked this pull request as draft May 9, 2024 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants