Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

/federation/v1/send returns 404 and logs at INFO for "Could not find event" errors #10597

Open
MurzNN opened this issue Aug 13, 2021 · 2 comments
Labels
A-Federation A-Logging Synapse's logs (structured or otherwise). Not metrics. T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Comments

@MurzNN
Copy link

MurzNN commented Aug 13, 2021

Description:

Currently "Could not find event" federation errors are logged with INFO level, here is example of such string:

2021-08-12 12:00:57,947 - synapse.http.server - 88 - INFO - PUT-1922933- <SynapseRequest at 0x7f2bf47735b0 method='PUT' uri='/_matrix/federation/v1/send/1628611386500' clientproto='HTTP/1.0' site=12011> SynapseError: 404 - Could not find event $Dv3wmat7XbtBm4O37eT+Oa2kxHKUfCj36nCMiHyj5F4

But this error may cause the large problem with whole federation process, such as described in #10589 and #6867.

So better is to treat this error with ERROR logging level, or, at least, WARNING, for easier figuring out that problem, because many admins try to check errors only by filtering logs by ERROR level, because of huge amount of log lines, and even didn't see that problem.

@richvdh richvdh changed the title Treat "Could not find event" federation errors as ERROR instead of INFO /federation/v1/send returns 404 and logs at INFO for "Could not find event" errors Aug 13, 2021
@richvdh
Copy link
Member

richvdh commented Aug 13, 2021

Both problems (the fact this is logged at INFO, and the fact that /_matrix/federation/v1/send is returning a 404 rather than a 500) are somewhat tied up with the fact that the Synapse codebase tends to pick HTTP response codes in the depths of utility functions, rather than in the REST layer.

In this case, the storage layer raises a NotFoundError (essentially a SynapseError with code 404) when it fails to find a given event in the database. This propagates to the REST layer, which sees a SynapseError with a 4xx code, which indicates a client error, so something we should log at INFO rather than WARNING.

This is the correct behaviour when the error happens as the result of a REST call to /_matrix/client/r0/room/{room_id}/room/{event_id}, but not really in /_matrix/federation/v1/send.

I think the only tractable way out of this mess is:

  1. Start using a separate exception hierarchy for unexpected internal errors. If these propagate to the REST layer, they should be turned into 500 errors with an error code of M_UNKNOWN.
  2. Change get_event to raise such an error instead of NotFoundError when the event is not found
  3. Update the REST layer for /_matrix/client/r0/room/{room_id}/room/{event_id} to generate a 404 if the event is not found (actually, most of the code is already there, but unused!)

@erikjohnston
Copy link
Member

It's worth noting for context here that we should not be propagating any errors up the REST layer for /send once we've successfully parsed the request body. All the handling of new PDUs and EDUs should be wrapped in try/except to ensure that e.g. a broken room doesn't break inbound federation entirely. So there's a separate bug here that we're even allowing an exception to propagate up to the REST layer.

That's not to say we shouldn't fix our exception handling.

@erikjohnston erikjohnston added the T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. label Aug 25, 2021
@MadLittleMods MadLittleMods added the A-Logging Synapse's logs (structured or otherwise). Not metrics. label Dec 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Federation A-Logging Synapse's logs (structured or otherwise). Not metrics. T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
Development

No branches or pull requests

4 participants