Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement full GC suspension #67805

Closed
9 tasks done
jkotas opened this issue Sep 17, 2021 · 7 comments
Closed
9 tasks done

Implement full GC suspension #67805

jkotas opened this issue Sep 17, 2021 · 7 comments

Comments

@jkotas
Copy link
Member

jkotas commented Sep 17, 2021

NativeAOT is missing implementation of full GC suspension logic. This manifest as hang when one thread of the program is running tight loop without any GC probes and another thread triggers GC.

More context:

Workitems:

@VSadov
Copy link
Member

VSadov commented May 10, 2022

In CoreCLR we have 3 mechanisms for suspension:

  • polling
  • return hijacking
  • redirection

We generally guarantee that any loop can be suspended via at least one mechanism. More than one is better.

Ideally we should have the same guarantee on NativeAOT - at least one suspension mechanism should be available.
Permitting "unsuspendable" loops could result in unpredictable pauses/hangs.

@VSadov VSadov self-assigned this May 10, 2022
@GSPP
Copy link

GSPP commented May 11, 2022

Go has been plagued by unsuspendable loops. For a long time, the team was unwilling to sacrifice performance to make loops strictly suspendable. This caused instability of applications under certain circumstances.

This issue has attracted about 300 comments from people who encountered a problem with that behavior.

@LakshanF
Copy link
Contributor

LakshanF commented Jun 2, 2022

A likely helpful testcase to go with this issue that repro's on a win-x64 machine. The steps to repro are as below,

  1. Build the libraries from the steps outlined in building and Running library tests.
  2. cd src\libraries\System.Collections.Concurrent\tests
  3. dotnet.cmd build -c Release /t:Test /p:TestNativeAot=true

The test should hang. The following selected call stacks show the issues with GC suspension: Thread 34 below looks interesting in that the call stack seems broken for this test scenario,

  34  Id: b948.beb4 Suspend: 3 Teb: 00000064`72f40000 Unfrozen
 # Child-SP          RetAddr               Call Site
00 00000064`746ff310 00007ff6`64f3fa7f     System_Collections_Concurrent_Tests!System_Collections_Concurrent_Tests_System_Collections_Concurrent_Tests_ConcurrentStackTests___c__DisplayClass17_0___Concurrent_Push_TryPop_WithSuspensions_b__0+0x1e [D:\work\Core\CurrentWork3\runtime\src\libraries\System.Collections.Concurrent\tests\ConcurrentStackTests.cs @ 193] 
01 00000064`746ff360 000001c4`b13a8bd0     System_Collections_Concurrent_Tests!ThreadStore::WaitForSuspendComplete+0x3f
02 00000064`746ff368 00000000`00000000     0x000001c4`b13a8bd0

  36  Id: b948.b2d0 Suspend: 2 Teb: 00000064`72f44000 Unfrozen
 # Child-SP          RetAddr               Call Site
00 00000064`74afef38 00007ffd`b088fbbb     ntdll!NtGetContextThread+0x14
01 00000064`74afef40 00007ff6`64f4595b     KERNELBASE!GetThreadContext+0xb
02 (Inline Function) --------`--------     System_Collections_Concurrent_Tests!PalGetThreadContext+0x14 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp @ 321] 
03 00000064`74afef70 00007ff6`64f3e7f6     System_Collections_Concurrent_Tests!PalHijack+0x7b [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp @ 408] 
04 00000064`74aff590 00007ff6`64f3f9c0     System_Collections_Concurrent_Tests!Thread::Hijack+0x66 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\thread.cpp @ 643] 
05 00000064`74aff5c0 00007ff6`64f44cd5     System_Collections_Concurrent_Tests!ThreadStore::SuspendAllThreads+0x130 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\threadstore.cpp @ 239] 
06 00000064`74aff620 00007ff6`64f408d0     System_Collections_Concurrent_Tests!GCToEEInterface::SuspendEE+0xa5 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\gcrhenv.cpp @ 811] 
07 00000064`74aff660 00007ff6`65173855     System_Collections_Concurrent_Tests!RhGetTotalAllocatedBytesPrecise+0x10 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\GCHelpers.cpp @ 299] 
08 00000064`74aff6b0 00007ff6`653d24cd     System_Collections_Concurrent_Tests!System_Collections_Concurrent_Tests_System_Collections_Concurrent_Tests_ConcurrentStackTests___c__DisplayClass17_0___Concurrent_Push_TryPop_WithSuspensions_b__1+0x45 [D:\work\Core\CurrentWork3\runtime\src\libraries\System.Collections.Concurrent\tests\ConcurrentStackTests.cs @ 205] 
09 00000064`74aff740 00007ff6`653dc991     System_Collections_Concurrent_Tests!S_P_CoreLib_System_Threading_ExecutionContext__RunFromThreadPoolDispatchLoop+0x3d [D:\work\Core\CurrentWork3\runtime\src\libraries\System.Private.CoreLib\src\System\Threading\ExecutionContext.cs @ 268] 
0a 00000064`74aff790 00007ff6`653d8012     System_Collections_Concurrent_Tests!S_P_CoreLib_System_Threading_Tasks_Task__ExecuteWithThreadLocal+0x231 [D:\work\Core\CurrentWork3\runtime\src\libraries\System.Private.CoreLib\src\System\Threading\Tasks\Task.cs @ 2349] 
0b 00000064`74aff8b0 00007ff6`653ce353     System_Collections_Concurrent_Tests!S_P_CoreLib_System_Threading_ThreadPoolWorkQueue__Dispatch+0x1f2 [D:\work\Core\CurrentWork3\runtime\src\libraries\System.Private.CoreLib\src\System\Threading\ThreadPoolWorkQueue.cs @ 730] 
0c 00000064`74aff940 00007ffd`b30a9d5a     System_Collections_Concurrent_Tests!S_P_CoreLib_System_Threading_ThreadPool__DispatchCallback+0x43 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\System.Private.CoreLib\src\System\Threading\ThreadPool.Windows.cs @ 373] 
0d 00000064`74aff980 00007ffd`b3057386     ntdll!TppWorkpExecuteCallback+0x13a
0e 00000064`74aff9d0 00007ffd`b10754e0     ntdll!TppWorkerThread+0x686
0f 00000064`74affcc0 00007ffd`b304485b     KERNEL32!BaseThreadInitThunk+0x10
10 00000064`74affcf0 00000000`00000000     ntdll!RtlUserThreadStart+0x2b

 29  Id: 8d30.1510 Suspend: 1 Teb: 00000016`04a1a000 Unfrozen
 # Child-SP          RetAddr               Call Site
00 00000016`061ff6f8 00007ffd`b0862a0e     ntdll!NtWaitForSingleObject+0x14
01 00000016`061ff700 00007ff7`2350ebc1     KERNELBASE!WaitForSingleObjectEx+0x8e
02 00000016`061ff7a0 00007ff7`2350efe8     System_Collections_Concurrent_Tests!Thread::WaitForGC+0x41 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\thread.cpp @ 90] 
03 (Inline Function) --------`--------     System_Collections_Concurrent_Tests!Thread::ReversePInvokeAttachOrTrapThread+0xc7 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\thread.cpp @ 1220] 
04 00000016`061ff7d0 00007ff7`2350ef03     System_Collections_Concurrent_Tests!RhpReversePInvokeAttachOrTrapThread2+0xd8 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\thread.cpp @ 1387] 
05 00000016`061ff800 00007ff7`2399d35e     System_Collections_Concurrent_Tests!RhpReversePInvoke+0x93 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\thread.cpp @ 1401] 
06 00000016`061ff830 00007ff7`23515851     System_Collections_Concurrent_Tests!S_P_CoreLib_System_Threading_Thread__OnThreadExit+0xe [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\System.Private.CoreLib\src\System\Threading\Thread.NativeAot.Windows.cs @ 130] 
07 00000016`061ff870 00007ff7`2350f68a     System_Collections_Concurrent_Tests!PalDetachThread+0x41 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\windows\PalRedhawkMinWin.cpp @ 210] 
08 00000016`061ff8a0 00007ffd`b3094a0c     System_Collections_Concurrent_Tests!ThreadStore::DetachCurrentThread+0x4a [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\threadstore.cpp @ 162] 
24  Id: 8d30.8334 Suspend: 1 Teb: 00000016`04a0e000 Unfrozen
 # Child-SP          RetAddr               Call Site
00 (Inline Function) --------`--------     System_Collections_Concurrent_Tests!YieldProcessorNormalizedForPreSkylakeCount+0x18 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\yieldprocessornormalized.h @ 173] 
01 00000016`05bff400 00007ff7`23514cd5     System_Collections_Concurrent_Tests!ThreadStore::SuspendAllThreads+0x182 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\threadstore.cpp @ 262] 
02 00000016`05bff460 00007ff7`235108d0     System_Collections_Concurrent_Tests!GCToEEInterface::SuspendEE+0xa5 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\gcrhenv.cpp @ 811] 
03 00000016`05bff4a0 00007ff7`23743855     System_Collections_Concurrent_Tests!RhGetTotalAllocatedBytesPrecise+0x10 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\GCHelpers.cpp @ 299] 
04 00000016`05bff4f0 00007ff7`239a24cd     System_Collections_Concurrent_Tests!System_Collections_Concurrent_Tests_System_Collections_Concurrent_Tests_ConcurrentStackTests___c__DisplayClass17_0___Concurrent_Push_TryPop_WithSuspensions_b__1+0x45 [D:\work\Core\CurrentWork3\runtime\src\libraries\System.Collections.Concurrent\tests\ConcurrentStackTests.cs @ 205] 
05 00000016`05bff580 00007ff7`239ac991     System_Collections_Concurrent_Tests!S_P_CoreLib_System_Threading_ExecutionContext__RunFromThreadPoolDispatchLoop+0x3d [D:\work\Core\CurrentWork3\runtime\src\libraries\System.Private.CoreLib\src\System\Threading\ExecutionContext.cs @ 268] 
06 00000016`05bff5d0 00007ff7`239a8012     System_Collections_Concurrent_Tests!S_P_CoreLib_System_Threading_Tasks_Task__ExecuteWithThreadLocal+0x231 [D:\work\Core\CurrentWork3\runtime\src\libraries\System.Private.CoreLib\src\System\Threading\Tasks\Task.cs @ 2349] 
07 00000016`05bff6f0 00007ff7`2399e353     System_Collections_Concurrent_Tests!S_P_CoreLib_System_Threading_ThreadPoolWorkQueue__Dispatch+0x1f2 [D:\work\Core\CurrentWork3\runtime\src\libraries\System.Private.CoreLib\src\System\Threading\ThreadPoolWorkQueue.cs @ 730] 
08 00000016`05bff780 00007ffd`b30a9d5a     System_Collections_Concurrent_Tests!S_P_CoreLib_System_Threading_ThreadPool__DispatchCallback+0x43 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\System.Private.CoreLib\src\System\Threading\ThreadPool.Windows.cs @ 373] 


16  Id: 8d30.97e8 Suspend: 1 Teb: 00000016`04bfd000 Unfrozen
 # Child-SP          RetAddr               Call Site
00 00000016`053fb908 00007ffd`b0862a0e     ntdll!NtWaitForSingleObject+0x14
01 00000016`053fb910 00007ff7`2350ebc1     KERNELBASE!WaitForSingleObjectEx+0x8e
02 00000016`053fb9b0 00007ff7`2350f053     System_Collections_Concurrent_Tests!Thread::WaitForGC+0x41 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\thread.cpp @ 90] 
03 00000016`053fb9e0 00007ff7`2350ffbb     System_Collections_Concurrent_Tests!RhpWaitForGC2+0x33 [D:\work\Core\CurrentWork3\runtime\src\coreclr\nativeaot\Runtime\thread.cpp @ 974] 
04 00000016`053fba10 00007ff7`239a726c     System_Collections_Concurrent_Tests!RhpGcPollRare+0x2b
05 00000016`053fbaa0 00007ff7`239a3ec8     System_Collections_Concurrent_Tests!S_P_CoreLib_System_Threading_SpinWait__SpinOnceCore+0xdc [D:\work\Core\CurrentWork3\runtime\src\libraries\System.Private.CoreLib\src\System\Threading\SpinWait.cs @ 238] 
06 00000016`053fbae0 00007ff7`239aeef8     System_Collections_Concurrent_Tests!S_P_CoreLib_System_Threading_ManualResetEventSlim__Wait_4+0xc8 [D:\work\Core\CurrentWork3\runtime\src\libraries\System.Private.CoreLib\src\System\Threading\ManualResetEventSlim.cs @ 528] 
07 00000016`053fbb80 00007ff7`239aeb4f     System_Collections_Concurrent_Tests!S_P_CoreLib_System_Threading_Tasks_Task__WaitAllBlockingCore+0xf8 [D:\work\Core\CurrentWork3\runtime\src\libraries\System.Private.CoreLib\src\System\Threading\Tasks\Task.cs @ 4932] 
08 00000016`053fbbf0 00007ff7`2372eccb     System_Collections_Concurrent_Tests!S_P_CoreLib_System_Threading_Tasks_Task__WaitAllCore+0x2bf [D:\work\Core\CurrentWork3\runtime\src\libraries\System.Private.CoreLib\src\System\Threading\Tasks\Task.cs @ 4847] 
09 00000016`053fbc90 00007ff7`240df18e     System_Collections_Concurrent_Tests!System_Collections_Concurrent_Tests_System_Collections_Concurrent_Tests_ConcurrentStackTests__Concurrent_Push_TryPop_WithSuspensions+0xbb [D:\work\Core\CurrentWork3\runtime\src\libraries\System.Collections.Concurrent\tests\ConcurrentStackTests.cs @ 210] 
0a 00000016`053fbcd0 00007ff7`239141e8     System_Collections_Concurrent_Tests!Internal_CompilerGenerated__Module___InvokeRetV+0x3e

@LakshanF
Copy link
Contributor

A likely helpful testcase to go with this issue that repro's on a win-x64 machine. The steps to repro are as below,

  1. Build the libraries from the steps outlined in building and Running library tests.
  2. cd src\libraries\System.Collections.Concurrent\tests
  3. dotnet.cmd build -c Release /t:Test /p:TestNativeAot=true

The test should hang.

This no longer repro's (#70831) in the latest build with the current merges

@VSadov
Copy link
Member

VSadov commented Jul 19, 2022

Async suspension on unix-x64 is now functional with #71187 merged

@VSadov
Copy link
Member

VSadov commented Aug 6, 2022

With #73216 merged all supported platforms should be able to reliably suspend now.

@VSadov
Copy link
Member

VSadov commented Aug 13, 2022

All the tasks planned for this workitem are done. I think we can close this now as Complete.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Archived in project
Development

No branches or pull requests

6 participants