Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move updating connectivity state outside of subchannel lock #2601

Merged
merged 2 commits into from
Feb 25, 2025

Conversation

JamesNK
Copy link
Member

@JamesNK JamesNK commented Feb 24, 2025

Fixes #2589

In connection manager and friends there are two important types of lock:

  • A lock in the connection manager
  • A lock for each subchannel

It's safe to take a subchannel lock inside the connection manager lock, but doing the opposite can lead to a deadlock.

There is one place locks were taken in the incorrect order which could cause a deadlock in rare circumstances: the channel is pick first, it has an idle subchannel, and a connection request occurs at the same time as the resolver updates.

Repoed in a unit test:

image

The fix is to move the connectivity state update outside of the subchannel lock.

@JamesNK JamesNK force-pushed the jamesnk/deadlock-20250224 branch from 81ad06f to 35edd39 Compare February 24, 2025 15:38
@JamesNK JamesNK changed the title Remove subchannel update lock Move updating connectivity state outside of subchannel lock Feb 24, 2025
@JamesNK JamesNK marked this pull request as ready for review February 24, 2025 15:50
@@ -264,6 +266,11 @@ public void RequestConnection()
}
}

if (connectionRequested)
{
UpdateConnectivityState(ConnectivityState.Connecting, "Connection requested.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this now going to cause races where someone else comes along and modifies the Subchannel state and we end up updating it to Connecting when it should stay as TransientFailure or Ready?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is fine. The logic in load balancing is designed that state can change at any point, and other parts of the system then react to it. All other places update the connectivity state outside the subchannel lock, this one is the outlier.

@JamesNK JamesNK merged commit 852a118 into grpc:master Feb 25, 2025
5 checks passed
@JamesNK JamesNK deleted the jamesnk/deadlock-20250224 branch February 25, 2025 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential deadlock in subchannel state update
2 participants