-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Clarify (room_id, event_id)
global uniqueness
#13701
Changes from all commits
25fa1c4
14c1dd5
4fb453f
5dd24f3
893d4c5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Clarify `(room_id, event_id)` global uniqueness and how we should scope our database schemas. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -191,3 +191,27 @@ There are three separate aspects to this: | |
flavour will be accepted by SQLite 3.22, but will give a column whose | ||
default value is the **string** `"FALSE"` - which, when cast back to a boolean | ||
in Python, evaluates to `True`. | ||
|
||
|
||
## `event_id` global uniqueness | ||
|
||
In room versions `1` and `2` it's possible to end up with two events with the | ||
same `event_id` (in the same or different rooms). After room version `3`, that | ||
can only happen with a hash collision, which we basically hope will never | ||
happen. | ||
|
||
There are several places in Synapse and even Matrix APIs like [`GET | ||
/_matrix/federation/v1/event/{eventId}`](https://spec.matrix.org/v1.1/server-server-api/#get_matrixfederationv1eventeventid) | ||
where we assume that event IDs are globally unique. | ||
|
||
But hash collisions are still possible, and by treating event IDs as room | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually think hash collisions from sheer probability are not much of a justifiable problem (maybe SHA256 will get defeated one day I suppose...?) I expect the main problem is probably intentional collisions (esp in v1 rooms), where namespacing events by room means that we don't let a bad actor interfere with any rooms they're not in. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
🤷 Probably, maybe The rest of my reply is just linking stuff from my own curiosity: SHA-1 attack, https://github.blog/2017-03-20-sha-1-collision-detection-on-garden.eu.org/ Other reading:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a bit confused about where we've ended up on this thread: are hash collisions (feasibly) possible or not? Room v1 and v2 have bigger problems than event-id clashes between rooms. The solution to that is to stop using v1 and v2 rooms, not to arrange the entire database schema and matrix API around a half-assed fix to it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Probably not feasible but I wouldn't rule it out one day.
I'm confused by this. Do we prefer There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That's the entire discussion here, and I don't think we have a clear conclusion. Personally, I don't really see the point in including
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was discussed in the backend chapter sync as well which also brought up #13771.
This PR captures the tribal knowledge you mentioned in,
@reivilibre's number investigation is a good enough to disprove the sheer chance that a client and server run into a collision. I'm less convinced there won't be a way to exploit things in the future (targeted attack) but we can update this part of the doc to not call it out as much. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well, we probably need to discuss this further when I'm back from leave. #12892 moves in exactly the opposite direction to that suggested here. |
||
scoped, we can reduce the possibility of a hash collision. When scoping | ||
`event_id` in the database schema, it should be also accompanied by `room_id` | ||
(`PRIMARY KEY (room_id, event_id)`) and lookups should be done through the pair | ||
`(room_id, event_id)`. | ||
|
||
There has been a lot of debate on this in places like | ||
https://github.com/matrix-org/matrix-spec-proposals/issues/2779 and | ||
[MSC2848](https://github.com/matrix-org/matrix-spec-proposals/pull/2848) which | ||
has no resolution yet (as of 2022-09-01). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems rather sad considering they are very much not and it's not very hard to cause a conflict :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can hope for MSC2848 🙌