Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dial is affected by query context cancellation #1259

Closed
jameshartig opened this issue Jul 22, 2022 · 5 comments · Fixed by #1261
Closed

Dial is affected by query context cancellation #1259

jameshartig opened this issue Jul 22, 2022 · 5 comments · Fixed by #1261

Comments

@jameshartig
Copy link
Contributor

jameshartig commented Jul 22, 2022

We noticed a lot of: failed to connect to ...: dial error (dial tcp ...: operation was canceled) and this seems to stem from the fact that puddle does p.constructResourceValue(ctx) and passes along the context from the original query. If the query is cancelled (user closed the tab, etc), we don't necessarily want to throw away the connection.

While I do think there's value in passing a context (which might contain helpful information on what originated the request and tracing information) maybe the puddle constructor (in pgxpool) should be overwriting the original context's cancellation? There are some implementations of how to ignore the cancellation out there in golang/go#40221 (Under Existing Implementations). This would make acquire-initiated connection creations match pool-health-initiated connection creattions (which just send context.Background() to p.p.CreateResource(ctx)).

What was happening to us is that there were hundreds of these dial error ... operation was canceled and we were DDOSing the DB without creating any useful connections. This was during a period where the DB was already struggling and was taking longer to create connections. This issue further exacerbated the problem by causing even more load on the DB.

@jackc
Copy link
Owner

jackc commented Jul 23, 2022

That's an interesting problem.

It looks like there are a number of home grown solutions, but it doesn't look like there is any standard solution yet.

Wrapping the existing context with something that delegated Value but ignored cancellation would be pretty trivial, but the comments mention that there are some awkward edge cases with requested scoped values unexpectedly outlasting the request.

Maybe the solution is a new background context that adds the original context as a value. Then it is available to anything that needs it. But they have to explicitly access the original context instead of it implicitly being used.

@jameshartig
Copy link
Contributor Author

Wrapping the existing context with something that delegated Value but ignored cancellation would be pretty trivial, but the comments mention that there are some awkward edge cases with requested scoped values unexpectedly outlasting the request.

I saw that but it seemed like there was some disagreement on that. Only in the case where it would've timed out would there actually be some unexpected lasting values. In the common case where it doesn't time out it would work exactly the same. Given that, it seems beneficial to as part of a trace to know that a new connection was initiated.

Maybe the solution is a new background context that adds the original context as a value. Then it is available to anything that needs it. But they have to explicitly access the original context instead of it implicitly being used.

I'm not totally opposed to this but it would need to be documented so that anyone that wants tracing or other values would need to explicitly get them from the inner context.

Overall, given the prevalence of the wrap-without-cancel approach, that seems like the better path to go, especially since most of the time it will keep the existing semantics the same and it would only change things in the failure case.

I can work on a PR for this just let me know which way to go.

@jackc
Copy link
Owner

jackc commented Jul 23, 2022

I can work on a PR for this just let me know which way to go.

As it is not something that actually affects me I will defer to you on wrap without cancel.

Base the PR against the v1 branch in puddle. Master uses generics which we don't want to pull into pgx v4.

@jameshartig
Copy link
Contributor Author

jameshartig commented Jul 23, 2022

I was actually considering doing this in pgxpool and not puddle since I'd prefer puddle send the original context in case other users of puddle want it. This seems simple enough to do in the constructor function that we send to puddle. Does that make sense?

@jackc
Copy link
Owner

jackc commented Jul 23, 2022

Oh you're right. For some reason it totally escaped me that it could be done outside. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants