-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[6.0] Modify thread pool thread counting to be a bit more defensive #70479
Conversation
- Port of dotnet#70478 to 6.0 - An unexpected underflow in one or more thread counts can lead to a large number of threads to be created continually - Prevented underflows in changes to thread counts, such that following an unexpected underflow, subsequent paired increments and decrements would avoid repeating the underflow - Verified by creating an unexpected underflow in the debugger
Tagging subscribers to this area: @mangod9 Issue Details
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. We will take for consideration in 6.0.x
@kouvel/ @mangod9 thoughts on the failures? |
both are failing with Helix queue issues:
I will retry if this is somehow intermittent. |
This would need backport of #69871 looks like |
PR for the infra change is here: #70570. @jeffschwMSFT since this is only a infra change hoping doesnt need a full approval process. |
/azp run runtime |
Azure Pipelines successfully started running 1 pipeline(s). |
Customer Impact
It was seen a few times that when all worker threads have exited, somehow there is a decrement on the tracked thread counts, which leads to some counts going negative. As work is queued to the thread pool thereafter, this leads to additional threads being created and for the underflow to repeat when those threads run out of work. Combined with hill climbing frequently resetting the desired count of threads, the issue keeps repeating and more and more worker threads are created. The issue appears to be very rare, but when it occurs, it can lead to the process accumulating >> 100K threads over a relatively short span of time, and with a large amount of CPU time spent in spinning up and tearing down worker threads. This is a defensive fix to have the thread pool behave less erratically in the event of an unexpected underflow.
Regression?
The issue started happening in .NET 6 with the portable thread pool. The reason for the unexpected underflow being seen is not clear. There doesn't appear to be a realistic possibility of underflow in the source code or generated machine code. Whether it is a regression is unclear at the moment.
Testing
Verified the issue and fix by creating an unexpected underflow in the debugger and monitoring the process thread count following that.
Risk
Low - If there is no underflow, the behavior is the same as before.