Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[6.0] Modify thread pool thread counting to be a bit more defensive #70479

Merged
merged 2 commits into from
Jun 11, 2022

Conversation

kouvel
Copy link
Member

@kouvel kouvel commented Jun 9, 2022

  • Port of Modify thread pool thread counting to be a bit more defensive #70478 to 6.0
  • An unexpected underflow in one or more thread counts can lead to a large number of threads to be created continually
  • Prevented underflows in changes to thread counts, such that following an unexpected underflow, subsequent paired increments and decrements would avoid repeating the underflow
  • Verified by creating an unexpected underflow in the debugger

Customer Impact

It was seen a few times that when all worker threads have exited, somehow there is a decrement on the tracked thread counts, which leads to some counts going negative. As work is queued to the thread pool thereafter, this leads to additional threads being created and for the underflow to repeat when those threads run out of work. Combined with hill climbing frequently resetting the desired count of threads, the issue keeps repeating and more and more worker threads are created. The issue appears to be very rare, but when it occurs, it can lead to the process accumulating >> 100K threads over a relatively short span of time, and with a large amount of CPU time spent in spinning up and tearing down worker threads. This is a defensive fix to have the thread pool behave less erratically in the event of an unexpected underflow.

Regression?

The issue started happening in .NET 6 with the portable thread pool. The reason for the unexpected underflow being seen is not clear. There doesn't appear to be a realistic possibility of underflow in the source code or generated machine code. Whether it is a regression is unclear at the moment.

Testing

Verified the issue and fix by creating an unexpected underflow in the debugger and monitoring the process thread count following that.

Risk

Low - If there is no underflow, the behavior is the same as before.

- Port of dotnet#70478 to 6.0
- An unexpected underflow in one or more thread counts can lead to a large number of threads to be created continually
- Prevented underflows in changes to thread counts, such that following an unexpected underflow, subsequent paired increments and decrements would avoid repeating the underflow
- Verified by creating an unexpected underflow in the debugger
@kouvel kouvel added this to the 6.0.x milestone Jun 9, 2022
@kouvel kouvel requested review from janvorli and mangod9 June 9, 2022 12:46
@kouvel kouvel self-assigned this Jun 9, 2022
@ghost
Copy link

ghost commented Jun 9, 2022

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Issue Details
  • Port of Modify thread pool thread counting to be a bit more defensive #70478 to 6.0
  • An unexpected underflow in one or more thread counts can lead to a large number of threads to be created continually
  • Prevented underflows in changes to thread counts, such that following an unexpected underflow, subsequent paired increments and decrements would avoid repeating the underflow
  • Verified by creating an unexpected underflow in the debugger
Author: kouvel
Assignees: kouvel
Labels:

area-System.Threading

Milestone: 6.0.x

@kouvel kouvel added the Servicing-consider Issue for next servicing release review label Jun 9, 2022
Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. We will take for consideration in 6.0.x

@jeffschwMSFT jeffschwMSFT added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Jun 10, 2022
@jeffschwMSFT jeffschwMSFT modified the milestones: 6.0.x, 6.0.7 Jun 10, 2022
@jeffschwMSFT
Copy link
Member

@kouvel/ @mangod9 thoughts on the failures?

@mangod9
Copy link
Member

mangod9 commented Jun 10, 2022

both are failing with Helix queue issues:

SENDHELIXJOB(0,0): error : Helix queue windows.11.amd64.clientpre.open is set for estimated removal date of 2022-06-16

I will retry if this is somehow intermittent.

@mangod9
Copy link
Member

mangod9 commented Jun 10, 2022

This would need backport of #69871 looks like

@mangod9
Copy link
Member

mangod9 commented Jun 10, 2022

PR for the infra change is here: #70570. @jeffschwMSFT since this is only a infra change hoping doesnt need a full approval process.

@carlossanlop
Copy link
Member

carlossanlop commented Jun 10, 2022

@mangod9 I'll merge the helix PR first, then I will retry the failed CI legs in this PR.

Edit: Merged the other PR. Rerunning failed legs now.

@carlossanlop
Copy link
Member

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@carlossanlop carlossanlop merged commit 2ecbdbb into dotnet:release/6.0 Jun 11, 2022
@kouvel kouvel deleted the PtpCountsDefFix6 branch July 5, 2022 17:17
@ghost ghost locked as resolved and limited conversation to collaborators Aug 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Threading Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants