Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLRJit Access Violation on when attempting to run Windows ARM64 app #69222

Closed
LoopedBard3 opened this issue May 12, 2022 · 31 comments · Fixed by #69333
Closed

CLRJit Access Violation on when attempting to run Windows ARM64 app #69222

LoopedBard3 opened this issue May 12, 2022 · 31 comments · Fixed by #69333
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@LoopedBard3
Copy link
Member

Description

Also tracked: dotnet/performance#2421
In performance runs for Windows ARM64 we are failing due to run benchmarks due to this error:

[2022/05/09 06:29:58][INFO] Process terminated. Infinite recursion during resource lookup within System.Private.CoreLib.  This may be a bug in System.Private.CoreLib, or potentially in certain extensibility points such as assembly resolve events or CultureInfo names.  Resource name: Arg_AccessViolationException
[2022/05/09 06:29:58][INFO]    at System.Environment.FailFast(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.InternalGetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.GetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.AccessViolationException..ctor()
[2022/05/09 06:29:58][INFO]    at System.SpanHelpers.IndexOf(Char ByRef, Char, Int32)
[2022/05/09 06:29:58][INFO]    at System.String.Ctor(Char*)
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureData.GetLocaleInfoEx(System.String, UInt32)
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.GetUserDefaultLocaleName()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo..cctor()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.get_CachedCulturesByName()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.GetCultureInfo(System.String)
[2022/05/09 06:29:58][INFO]    at System.Resources.ManifestBasedResourceGroveler.GetNeutralResourcesLanguage(System.Reflection.Assembly, System.Resources.UltimateResourceFallbackLocation ByRef)
[2022/05/09 06:29:58][INFO]    at System.Resources.ResourceManager.CommonAssemblyInit()
[2022/05/09 06:29:58][INFO]    at System.SR.get_ResourceManager()
[2022/05/09 06:29:58][INFO]    at System.SR.InternalGetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.GetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.AccessViolationException..ctor()
[2022/05/09 06:29:58][INFO]    at System.SpanHelpers.IndexOf(Char ByRef, Char, Int32)
[2022/05/09 06:29:58][INFO]    at System.String.Ctor(Char*)
[2022/05/09 06:29:58][INFO]    at System.AppContext.Setup(Char**, Char**, Int32)

Looking deeper into this error and testing our built CoreRun.exe and the hello world console app locally, we were able to get the following error and stack trace:

(32cc.44a8): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
*** WARNING: Unable to verify checksum for C:\Users\parkerbibus\Desktop\TestPayload\Core_Root\clrjit.dll
clrjit!getLikelyClasses+0x18134:
00007ffb`153bdaf4 b86e78ea ldr         w10,[x7,x14 lsl #2]
 # Child-SP          RetAddr               Call Site
00 (Inline Function) --------`--------     clrjit!LinearScan::RegisterSelection::calculateCoversSets+0xb8 [D:\a\_work\1\s\src\coreclr\jit\lsra.cpp @ 11546] 
01 (Inline Function) --------`--------     clrjit!LinearScan::RegisterSelection::try_COVERS+0xb8 [D:\a\_work\1\s\src\coreclr\jit\lsra.cpp @ 11096] 
02 (Inline Function) --------`--------     clrjit!LinearScan::RegisterSelection::select+0x890 [D:\a\_work\1\s\src\coreclr\jit\lsra_score.h @ 27] 
03 000000ba`d61ca990 00007ffb`153c01a0     clrjit!LinearScan::allocateReg+0x8bc [D:\a\_work\1\s\src\coreclr\jit\lsra.cpp @ 2754] 
04 000000ba`d61caa50 00007ffb`153bc1b8     clrjit!LinearScan::allocateRegisters+0xd08 [D:\a\_work\1\s\src\coreclr\jit\lsra.cpp @ 5381] 
05 000000ba`d61cab40 00007ffb`1532eccc     clrjit!LinearScan::doLinearScan+0x1c8 [D:\a\_work\1\s\src\coreclr\jit\lsra.cpp @ 1246] 
06 (Inline Function) --------`--------     clrjit!Compiler::compCompile::__l2::<lambda_27ba776e5f59c8b10b5aa52fbc87aa8b>::operator()+0x24 [D:\a\_work\1\s\src\coreclr\jit\compiler.cpp @ 5059] 
07 000000ba`d61cab80 00007ffb`153ebdbc     clrjit!ActionPhase<<lambda_27ba776e5f59c8b10b5aa52fbc87aa8b> >::DoPhase+0x2c [D:\a\_work\1\s\src\coreclr\jit\phase.h @ 65] 
08 000000ba`d61cab90 00007ffb`1532c588     clrjit!Phase::Run+0x4c [D:\a\_work\1\s\src\coreclr\jit\phase.cpp @ 62] 
09 (Inline Function) --------`--------     clrjit!DoPhase+0x34 [D:\a\_work\1\s\src\coreclr\jit\phase.h @ 78] 
0a 000000ba`d61cabb0 00007ffb`1532d540     clrjit!Compiler::compCompile+0x830 [D:\a\_work\1\s\src\coreclr\jit\compiler.cpp @ 5063] 
0b 000000ba`d61cae00 00007ffb`1532ce48     clrjit!Compiler::compCompileHelper+0x6d8 [D:\a\_work\1\s\src\coreclr\jit\compiler.cpp @ 6729] 
0c 000000ba`d61caec0 00007ffb`1532dbdc     clrjit!Compiler::compCompile+0x4c8 [D:\a\_work\1\s\src\coreclr\jit\compiler.cpp @ 5861] 
0d 000000ba`d61caf90 00007ffb`15335c58     clrjit!jitNativeCode+0x17c [D:\a\_work\1\s\src\coreclr\jit\compiler.cpp @ 7362] 
0e 000000ba`d61cb180 00007ffb`0f0609b8     clrjit!CILJit::compileMethod+0x78 [D:\a\_work\1\s\src\coreclr\jit\ee_il_dll.cpp @ 279] 
*** WARNING: Unable to verify checksum for C:\Users\parkerbibus\Desktop\TestPayload\Core_Root\coreclr.dll
0f (Inline Function) --------`--------     coreclr!invokeCompileMethodHelper+0x1f4 [D:\a\_work\1\s\src\coreclr\vm\jitinterface.cpp @ 12367] 
10 (Inline Function) --------`--------     coreclr!invokeCompileMethod+0x250 [D:\a\_work\1\s\src\coreclr\vm\jitinterface.cpp @ 12430] 
11 000000ba`d61cb1e0 00007ffb`0f0916dc     coreclr!UnsafeJitFunction+0x10c8 [D:\a\_work\1\s\src\coreclr\vm\jitinterface.cpp @ 12904] 
12 000000ba`d61cb600 00007ffb`0f0908dc     coreclr!MethodDesc::JitCompileCodeLocked+0x1ec [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 946] 
13 (Inline Function) --------`--------     coreclr!MethodDesc::JitCompileCodeLockedEventWrapper+0x850 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 817] 
14 (Inline Function) --------`--------     coreclr!MethodDesc::JitCompileCode+0xcfc [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 757] 
15 (Inline Function) --------`--------     coreclr!MethodDesc::PrepareILBasedCode+0xf68 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 426] 
16 000000ba`d61cb7a0 00007ffb`0f092c18     coreclr!MethodDesc::PrepareCode+0xf94 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 323] 
17 (Inline Function) --------`--------     coreclr!CodeVersionManager::PublishVersionableCodeIfNecessary+0x29c [D:\a\_work\1\s\src\coreclr\vm\codeversion.cpp @ 1698] 
18 000000ba`d61cba40 00007ffb`0f092714     coreclr!MethodDesc::DoPrestub+0x490 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 2106] 
19 000000ba`d61ce6a0 00007ffb`0efd11f4     coreclr!PreStubWorker+0x1c4 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 1932] 
1a 000000ba`d61ce790 00007ffb`0cd49200     coreclr!ThePreStub+0x50 [D:\a\_work\1\s\artifacts\obj\coreclr\windows.arm64.Release\vm\wks\AsmHelpers.asm @ 5028] 
*** WARNING: Unable to verify checksum for C:\Users\parkerbibus\Desktop\TestPayload\Core_Root\System.Private.CoreLib.dll
1b 000000ba`d61ce8c0 00007ffb`0cd5fe84     System_Private_CoreLib+0x209200
1c 000000ba`d61ce8e0 00007ffb`0efd1f34     System_Private_CoreLib+0x21fe84
1d 000000ba`d61ce940 00007ffb`0f0e0a60     coreclr!CallDescrWorkerInternal+0x84 [D:\a\_work\1\s\artifacts\obj\coreclr\windows.arm64.Release\vm\wks\CallDescrWorkerARM64.asm @ 4539] 
1e (Inline Function) --------`--------     coreclr!CallDescrWorkerWithHandler+0x34 [D:\a\_work\1\s\src\coreclr\vm\callhelpers.cpp @ 67] 
1f 000000ba`d61ce960 00007ffb`0f00f9c0     coreclr!MethodDescCallSite::CallTargetWorker+0x2e8 [D:\a\_work\1\s\src\coreclr\vm\callhelpers.cpp @ 568] 
20 (Inline Function) --------`--------     coreclr!MethodDescCallSite::Call+0x14 [D:\a\_work\1\s\src\coreclr\vm\callhelpers.h @ 458] 
21 000000ba`d61cebb0 00007ffb`0f36a394     coreclr!CorHost2::CreateAppDomainWithManager+0x1f0 [D:\a\_work\1\s\src\coreclr\vm\corhost.cpp @ 630] 
22 000000ba`d61cf000 00007ff6`107d5588     coreclr!coreclr_initialize+0x4c4 [D:\a\_work\1\s\src\coreclr\dlls\mscoree\exports.cpp @ 254] 
23 000000ba`d61cf0e0 00007ff6`107d6778     corerun+0x5588
24 000000ba`d61cf670 00007ff6`107e65e0     corerun+0x6778
25 000000ba`d61cf780 00007ff6`107e6684     corerun!GetCurrentClrDetails+0xfe40
26 000000ba`d61cf7c0 00007ffb`744584a8     corerun!GetCurrentClrDetails+0xfee4
27 000000ba`d61cf7d0 00007ffb`74643068     KERNEL32!BaseThreadInitThunk+0x38
28 000000ba`d61cf810 00000000`00000000     ntdll!RtlUserThreadStart+0x48

Run with this error can be found here: https://dev.azure.com/dnceng/internal/_build/results?buildId=1765074&view=results, on either of the windows runs.

Reproduction Steps

Using the latest dotnet SDK on a Windows ARM64 machine:

Expected behavior

App should print hello world.

Actual behavior

Failure occurs, printing:

[2022/05/09 06:29:58][INFO] Process terminated. Infinite recursion during resource lookup within System.Private.CoreLib.  This may be a bug in System.Private.CoreLib, or potentially in certain extensibility points such as assembly resolve events or CultureInfo names.  Resource name: Arg_AccessViolationException
[2022/05/09 06:29:58][INFO]    at System.Environment.FailFast(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.InternalGetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.GetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.AccessViolationException..ctor()
[2022/05/09 06:29:58][INFO]    at System.SpanHelpers.IndexOf(Char ByRef, Char, Int32)
[2022/05/09 06:29:58][INFO]    at System.String.Ctor(Char*)
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureData.GetLocaleInfoEx(System.String, UInt32)
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.GetUserDefaultLocaleName()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo..cctor()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.get_CachedCulturesByName()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.GetCultureInfo(System.String)
[2022/05/09 06:29:58][INFO]    at System.Resources.ManifestBasedResourceGroveler.GetNeutralResourcesLanguage(System.Reflection.Assembly, System.Resources.UltimateResourceFallbackLocation ByRef)
[2022/05/09 06:29:58][INFO]    at System.Resources.ResourceManager.CommonAssemblyInit()
[2022/05/09 06:29:58][INFO]    at System.SR.get_ResourceManager()
[2022/05/09 06:29:58][INFO]    at System.SR.InternalGetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.GetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.AccessViolationException..ctor()
[2022/05/09 06:29:58][INFO]    at System.SpanHelpers.IndexOf(Char ByRef, Char, Int32)
[2022/05/09 06:29:58][INFO]    at System.String.Ctor(Char*)
[2022/05/09 06:29:58][INFO]    at System.AppContext.Setup(Char**, Char**, Int32)

Regression?

This last worked with runtime commit 3e5517b and first failed with f8fa9f6, this was made up of these PR's from what I could tell: #67771, #69013, and #68748.

Known Workarounds

No response

Configuration

This occurs on Windows ARM64 machines when running with the latest dotnet version. This does not seem to be an issue on non-Windows ARM64 configurations.

Other information

No response

@ghost ghost added the untriaged New issue has not been triaged by the area owner label May 12, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 12, 2022
@ghost
Copy link

ghost commented May 12, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Also tracked: dotnet/performance#2421
In performance runs for Windows ARM64 we are failing due to run benchmarks due to this error:

[2022/05/09 06:29:58][INFO] Process terminated. Infinite recursion during resource lookup within System.Private.CoreLib.  This may be a bug in System.Private.CoreLib, or potentially in certain extensibility points such as assembly resolve events or CultureInfo names.  Resource name: Arg_AccessViolationException
[2022/05/09 06:29:58][INFO]    at System.Environment.FailFast(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.InternalGetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.GetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.AccessViolationException..ctor()
[2022/05/09 06:29:58][INFO]    at System.SpanHelpers.IndexOf(Char ByRef, Char, Int32)
[2022/05/09 06:29:58][INFO]    at System.String.Ctor(Char*)
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureData.GetLocaleInfoEx(System.String, UInt32)
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.GetUserDefaultLocaleName()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo..cctor()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.get_CachedCulturesByName()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.GetCultureInfo(System.String)
[2022/05/09 06:29:58][INFO]    at System.Resources.ManifestBasedResourceGroveler.GetNeutralResourcesLanguage(System.Reflection.Assembly, System.Resources.UltimateResourceFallbackLocation ByRef)
[2022/05/09 06:29:58][INFO]    at System.Resources.ResourceManager.CommonAssemblyInit()
[2022/05/09 06:29:58][INFO]    at System.SR.get_ResourceManager()
[2022/05/09 06:29:58][INFO]    at System.SR.InternalGetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.GetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.AccessViolationException..ctor()
[2022/05/09 06:29:58][INFO]    at System.SpanHelpers.IndexOf(Char ByRef, Char, Int32)
[2022/05/09 06:29:58][INFO]    at System.String.Ctor(Char*)
[2022/05/09 06:29:58][INFO]    at System.AppContext.Setup(Char**, Char**, Int32)

Looking deeper into this error and testing our built CoreRun.exe and the hello world console app locally, we were able to get the following error and stack trace:

(32cc.44a8): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
*** WARNING: Unable to verify checksum for C:\Users\parkerbibus\Desktop\TestPayload\Core_Root\clrjit.dll
clrjit!getLikelyClasses+0x18134:
00007ffb`153bdaf4 b86e78ea ldr         w10,[x7,x14 lsl #2]
 # Child-SP          RetAddr               Call Site
00 (Inline Function) --------`--------     clrjit!LinearScan::RegisterSelection::calculateCoversSets+0xb8 [D:\a\_work\1\s\src\coreclr\jit\lsra.cpp @ 11546] 
01 (Inline Function) --------`--------     clrjit!LinearScan::RegisterSelection::try_COVERS+0xb8 [D:\a\_work\1\s\src\coreclr\jit\lsra.cpp @ 11096] 
02 (Inline Function) --------`--------     clrjit!LinearScan::RegisterSelection::select+0x890 [D:\a\_work\1\s\src\coreclr\jit\lsra_score.h @ 27] 
03 000000ba`d61ca990 00007ffb`153c01a0     clrjit!LinearScan::allocateReg+0x8bc [D:\a\_work\1\s\src\coreclr\jit\lsra.cpp @ 2754] 
04 000000ba`d61caa50 00007ffb`153bc1b8     clrjit!LinearScan::allocateRegisters+0xd08 [D:\a\_work\1\s\src\coreclr\jit\lsra.cpp @ 5381] 
05 000000ba`d61cab40 00007ffb`1532eccc     clrjit!LinearScan::doLinearScan+0x1c8 [D:\a\_work\1\s\src\coreclr\jit\lsra.cpp @ 1246] 
06 (Inline Function) --------`--------     clrjit!Compiler::compCompile::__l2::<lambda_27ba776e5f59c8b10b5aa52fbc87aa8b>::operator()+0x24 [D:\a\_work\1\s\src\coreclr\jit\compiler.cpp @ 5059] 
07 000000ba`d61cab80 00007ffb`153ebdbc     clrjit!ActionPhase<<lambda_27ba776e5f59c8b10b5aa52fbc87aa8b> >::DoPhase+0x2c [D:\a\_work\1\s\src\coreclr\jit\phase.h @ 65] 
08 000000ba`d61cab90 00007ffb`1532c588     clrjit!Phase::Run+0x4c [D:\a\_work\1\s\src\coreclr\jit\phase.cpp @ 62] 
09 (Inline Function) --------`--------     clrjit!DoPhase+0x34 [D:\a\_work\1\s\src\coreclr\jit\phase.h @ 78] 
0a 000000ba`d61cabb0 00007ffb`1532d540     clrjit!Compiler::compCompile+0x830 [D:\a\_work\1\s\src\coreclr\jit\compiler.cpp @ 5063] 
0b 000000ba`d61cae00 00007ffb`1532ce48     clrjit!Compiler::compCompileHelper+0x6d8 [D:\a\_work\1\s\src\coreclr\jit\compiler.cpp @ 6729] 
0c 000000ba`d61caec0 00007ffb`1532dbdc     clrjit!Compiler::compCompile+0x4c8 [D:\a\_work\1\s\src\coreclr\jit\compiler.cpp @ 5861] 
0d 000000ba`d61caf90 00007ffb`15335c58     clrjit!jitNativeCode+0x17c [D:\a\_work\1\s\src\coreclr\jit\compiler.cpp @ 7362] 
0e 000000ba`d61cb180 00007ffb`0f0609b8     clrjit!CILJit::compileMethod+0x78 [D:\a\_work\1\s\src\coreclr\jit\ee_il_dll.cpp @ 279] 
*** WARNING: Unable to verify checksum for C:\Users\parkerbibus\Desktop\TestPayload\Core_Root\coreclr.dll
0f (Inline Function) --------`--------     coreclr!invokeCompileMethodHelper+0x1f4 [D:\a\_work\1\s\src\coreclr\vm\jitinterface.cpp @ 12367] 
10 (Inline Function) --------`--------     coreclr!invokeCompileMethod+0x250 [D:\a\_work\1\s\src\coreclr\vm\jitinterface.cpp @ 12430] 
11 000000ba`d61cb1e0 00007ffb`0f0916dc     coreclr!UnsafeJitFunction+0x10c8 [D:\a\_work\1\s\src\coreclr\vm\jitinterface.cpp @ 12904] 
12 000000ba`d61cb600 00007ffb`0f0908dc     coreclr!MethodDesc::JitCompileCodeLocked+0x1ec [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 946] 
13 (Inline Function) --------`--------     coreclr!MethodDesc::JitCompileCodeLockedEventWrapper+0x850 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 817] 
14 (Inline Function) --------`--------     coreclr!MethodDesc::JitCompileCode+0xcfc [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 757] 
15 (Inline Function) --------`--------     coreclr!MethodDesc::PrepareILBasedCode+0xf68 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 426] 
16 000000ba`d61cb7a0 00007ffb`0f092c18     coreclr!MethodDesc::PrepareCode+0xf94 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 323] 
17 (Inline Function) --------`--------     coreclr!CodeVersionManager::PublishVersionableCodeIfNecessary+0x29c [D:\a\_work\1\s\src\coreclr\vm\codeversion.cpp @ 1698] 
18 000000ba`d61cba40 00007ffb`0f092714     coreclr!MethodDesc::DoPrestub+0x490 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 2106] 
19 000000ba`d61ce6a0 00007ffb`0efd11f4     coreclr!PreStubWorker+0x1c4 [D:\a\_work\1\s\src\coreclr\vm\prestub.cpp @ 1932] 
1a 000000ba`d61ce790 00007ffb`0cd49200     coreclr!ThePreStub+0x50 [D:\a\_work\1\s\artifacts\obj\coreclr\windows.arm64.Release\vm\wks\AsmHelpers.asm @ 5028] 
*** WARNING: Unable to verify checksum for C:\Users\parkerbibus\Desktop\TestPayload\Core_Root\System.Private.CoreLib.dll
1b 000000ba`d61ce8c0 00007ffb`0cd5fe84     System_Private_CoreLib+0x209200
1c 000000ba`d61ce8e0 00007ffb`0efd1f34     System_Private_CoreLib+0x21fe84
1d 000000ba`d61ce940 00007ffb`0f0e0a60     coreclr!CallDescrWorkerInternal+0x84 [D:\a\_work\1\s\artifacts\obj\coreclr\windows.arm64.Release\vm\wks\CallDescrWorkerARM64.asm @ 4539] 
1e (Inline Function) --------`--------     coreclr!CallDescrWorkerWithHandler+0x34 [D:\a\_work\1\s\src\coreclr\vm\callhelpers.cpp @ 67] 
1f 000000ba`d61ce960 00007ffb`0f00f9c0     coreclr!MethodDescCallSite::CallTargetWorker+0x2e8 [D:\a\_work\1\s\src\coreclr\vm\callhelpers.cpp @ 568] 
20 (Inline Function) --------`--------     coreclr!MethodDescCallSite::Call+0x14 [D:\a\_work\1\s\src\coreclr\vm\callhelpers.h @ 458] 
21 000000ba`d61cebb0 00007ffb`0f36a394     coreclr!CorHost2::CreateAppDomainWithManager+0x1f0 [D:\a\_work\1\s\src\coreclr\vm\corhost.cpp @ 630] 
22 000000ba`d61cf000 00007ff6`107d5588     coreclr!coreclr_initialize+0x4c4 [D:\a\_work\1\s\src\coreclr\dlls\mscoree\exports.cpp @ 254] 
23 000000ba`d61cf0e0 00007ff6`107d6778     corerun+0x5588
24 000000ba`d61cf670 00007ff6`107e65e0     corerun+0x6778
25 000000ba`d61cf780 00007ff6`107e6684     corerun!GetCurrentClrDetails+0xfe40
26 000000ba`d61cf7c0 00007ffb`744584a8     corerun!GetCurrentClrDetails+0xfee4
27 000000ba`d61cf7d0 00007ffb`74643068     KERNEL32!BaseThreadInitThunk+0x38
28 000000ba`d61cf810 00000000`00000000     ntdll!RtlUserThreadStart+0x48

Run with this error can be found here: https://dev.azure.com/dnceng/internal/_build/results?buildId=1765074&view=results, on either of the windows runs.

Reproduction Steps

Using the latest dotnet SDK on a Windows ARM64 machine:

Expected behavior

App should print hello world.

Actual behavior

Failure occurs, printing:

[2022/05/09 06:29:58][INFO] Process terminated. Infinite recursion during resource lookup within System.Private.CoreLib.  This may be a bug in System.Private.CoreLib, or potentially in certain extensibility points such as assembly resolve events or CultureInfo names.  Resource name: Arg_AccessViolationException
[2022/05/09 06:29:58][INFO]    at System.Environment.FailFast(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.InternalGetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.GetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.AccessViolationException..ctor()
[2022/05/09 06:29:58][INFO]    at System.SpanHelpers.IndexOf(Char ByRef, Char, Int32)
[2022/05/09 06:29:58][INFO]    at System.String.Ctor(Char*)
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureData.GetLocaleInfoEx(System.String, UInt32)
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.GetUserDefaultLocaleName()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo..cctor()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.get_CachedCulturesByName()
[2022/05/09 06:29:58][INFO]    at System.Globalization.CultureInfo.GetCultureInfo(System.String)
[2022/05/09 06:29:58][INFO]    at System.Resources.ManifestBasedResourceGroveler.GetNeutralResourcesLanguage(System.Reflection.Assembly, System.Resources.UltimateResourceFallbackLocation ByRef)
[2022/05/09 06:29:58][INFO]    at System.Resources.ResourceManager.CommonAssemblyInit()
[2022/05/09 06:29:58][INFO]    at System.SR.get_ResourceManager()
[2022/05/09 06:29:58][INFO]    at System.SR.InternalGetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.SR.GetResourceString(System.String)
[2022/05/09 06:29:58][INFO]    at System.AccessViolationException..ctor()
[2022/05/09 06:29:58][INFO]    at System.SpanHelpers.IndexOf(Char ByRef, Char, Int32)
[2022/05/09 06:29:58][INFO]    at System.String.Ctor(Char*)
[2022/05/09 06:29:58][INFO]    at System.AppContext.Setup(Char**, Char**, Int32)

Regression?

This last worked with runtime commit 3e5517b and first failed with f8fa9f6, this was made up of these PR's from what I could tell: #67771, #69013, and #68748.

Known Workarounds

No response

Configuration

This occurs on Windows ARM64 machines when running with the latest dotnet version. This does not seem to be an issue on non-Windows ARM64 configurations.

Other information

No response

Author: LoopedBard3
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@hoyosjs
Copy link
Member

hoyosjs commented May 12, 2022

This is likely a regression from ff23735

@jakobbotsch
Copy link
Member

cc @kunalspathak, looks like AV during LSRA

@kunalspathak
Copy link
Member

Will take a look.

@kunalspathak kunalspathak self-assigned this May 12, 2022
@kunalspathak
Copy link
Member

Few questions:

This occurs on Windows ARM64 machines when running with the latest dotnet version

  1. I believe you are running hello world using corerun.exe, correct?
  2. Is there any special machine configuration like CRT that should be present on the machine? I am wondering how is this machine different than the CI Arm64 machine that is running our tests?

@kunalspathak
Copy link
Member

I am actually able to repro this. I will investigate and submit a fix.

@kunalspathak
Copy link
Member

I am not able to repro this on my local checked/debug build. Wondering if this is release only failure. Building release version now.

@kunalspathak
Copy link
Member

yep, its checked vs. release difference. Will investigate further.

@kunalspathak
Copy link
Member

kunalspathak commented May 13, 2022

It seems that something is off with new C++ in genRegNumFromMask().

In Release:

genRegNumFromMask(4294967296)= 4294967295

In Debug:

genRegNumFromMask(4294967296)= 32

We use the return value to access an element from an array which is of size REG_COUNT. For arm64, it is 66. That's the reason of AV. I will check which bits are off, but that also explains why we didn't see this in CI, although I am surprised we don't run coreclr release on arm64?

@kunalspathak
Copy link
Member

TLDR

Because of C++ optimization, the code in genLog2() becomes functionally incorrect which leads to AV. It transforms the code:

unsigned lo32 = ulo32(value);  // == 0 for input 0x100000000
unsigned hi32 = uhi32(value); // == 1 for input 0x100000000

if (lo32 != 0) {
    return genLog2(lo32);
} else {
    return genLog2(hi32) + 32;  // returns 32
}

into

lo32 = ...  ; // == 0 for input 0x100000000
hi32 = ... ; // == 1 for input 0x100000000

if (hi32 == 0) {
  return genLog32(hi32) + 32;
} else {
 return genLog32(lo32);  // returns 0xFF
}

Details

Consider the genRegNumFromMask() method which invokes genLog2() method with value= 0x100000000.

inline regNumber genRegNumFromMask(regMaskTP mask)
{
assert(mask != 0); // Must have one bit set, so can't have a mask of zero
/* Convert the mask to a register number */
regNumber regNum = (regNumber)genLog2(mask);
/* Make sure we got it right */
assert(genRegMask(regNum) == mask);
return regNum;
}

Pasting here for quick reference:

genLog2() and surrounding code
// Given an unsigned 64-bit value, returns the lower 32-bits in unsigned format
//
inline unsigned ulo32(unsigned __int64 value)
{
    return static_cast<unsigned>(value);
}

// Given an unsigned 64-bit value, returns the upper 32-bits in unsigned format
//
inline unsigned uhi32(unsigned __int64 value)
{
    return static_cast<unsigned>(value >> 32);
}

/*****************************************************************************
 *
 *  Given a value that has exactly one bit set, return the position of that
 *  bit, in other words return the logarithm in base 2 of the given value.
 */

inline unsigned genLog2(unsigned __int64 value)
{
    unsigned lo32 = ulo32(value);
    unsigned hi32 = uhi32(value);

    if (lo32 != 0)
    {
        assert(hi32 == 0);
        return genLog2(lo32);
    }
    else
    {
        return genLog2(hi32) + 32;
    }
}

Here is the assembly code of the relevant portion:

...
 ; This code loads the magic number for % operation. See https://godbolt.org/z/o7hEq4feK 
00007ffe`8998da70 529229a2 mov         w2,#0x914D
00007ffe`8998da74 72b759e2 movk        w2,#0xBACF,lsl #0x10  

00007ffe`8998da78 b9407b26 ldr         w6,[x25,#0x78]
00007ffe`8998da7c b9405325 ldr         w5,[x25,#0x50]
00007ffe`8998da80 f9402724 ldr         x4,[x25,#0x48]
00007ffe`8998da84 f94017fb ldr         x27,[sp,#0x28]

; coversCandidates &= ~coversCandidateBit;
00007ffe`8998da88 cb0f03e8 neg         x8,x15
00007ffe`8998da8c ea0f010d ands        x13,x8,x15
00007ffe`8998da90 aa2d03e9 mvn         x9,x13             ; x13 = 0x100000000
00007ffe`8998da94 8a0f012f and         x15,x9,x15

; ------ START OF genLog2(). x13= 0x100000000 (value of coversCandidateBit)
; It directly does `value >> 32` operation which is part of uhi32() method
; and uses its result to check if it is 0 or not.
00007ffe`8998da98 d360fdae lsr         x14,x13,#0x20      ;  x14 becomes 1
00007ffe`8998da9c 54000140 beq         clrjit!LinearScan::allocateReg+0x88c (00007ffe`8998dac4)
00007ffe`8998daa0 9ba27da8 umull       x8,w13,w2
00007ffe`8998daa4 d360fd09 lsr         x9,x8,#0x20
00007ffe`8998daa8 4b0901aa sub         w10,w13,w9
00007ffe`8998daac 0b4a0528 add         w8,w9,w10,lsr #1
00007ffe`8998dab0 53057d09 lsr         w9,w8,#5
00007ffe`8998dab4 528004a8 mov         w8,#0x25
00007ffe`8998dab8 1b08b52c msub        w12,w9,w8,w13 ; <--- w12= 0

; Below instruction loads the element hashTable[hash], where hash's value is in w12==0
; x21 is the address of hashTable (see contents below)
; Here we load hashTable[0] which is 0xff.
00007ffe`8998dabc 38ec4aa8 ldrsb       w8,[x21,w12 uxtw #0]   ; x21= 00007ffe89a1e5a8
00007ffe`8998dac0 1400000a b           clrjit!LinearScan::allocateReg+0x8b0 (00007ffe`8998dae8)
00007ffe`8998dac4 9ba27dc8 umull       x8,w14,w2            ; <--- else part starts
00007ffe`8998dac8 d360fd09 lsr         x9,x8,#0x20
00007ffe`8998dacc 4b0901ca sub         w10,w14,w9
00007ffe`8998dad0 0b4a0528 add         w8,w9,w10,lsr #1
00007ffe`8998dad4 53057d09 lsr         w9,w8,#5
00007ffe`8998dad8 528004a8 mov         w8,#0x25
00007ffe`8998dadc 1b08b92c msub        w12,w9,w8,w14
00007ffe`8998dae0 38ec4aa8 ldrsb       w8,[x21,w12 uxtw #0]
00007ffe`8998dae4 11008108 add         w8,w8,#0x20           ; + 32 present in expr `genLog2(hi32) + 32`.
; ------ END OF genLog2()
00007ffe`8998dae8 d3407d0c uxtw        x12,w8
...

Contents of hashTable

static const char hashTable[PRIME] =
{
-1, 0, 1, 26, 2, 23, 27, -1, 3, 16,
24, 30, 28, 11, -1, 13, 4, 7, 17, -1,
25, 22, 31, 15, 29, 10, 12, 6, -1, 21,
14, 9, 5, 20, 8, 19, 18
};

hashtable:
0:000> db /c 4 00007ffe89a1e5a8
00007ffe`89a1e5a8  ff 00 01 1a  ....
00007ffe`89a1e5ac  02 17 1b ff  ....
00007ffe`89a1e5b0  03 10 18 1e  ....
00007ffe`89a1e5b4  1c 0b ff 0d  ....
00007ffe`89a1e5b8  04 07 11 ff  ....
00007ffe`89a1e5bc  19 16 1f 0f  ....
00007ffe`89a1e5c0  1d 0a 0c 06  ....
00007ffe`89a1e5c4  ff 15 0e 09  ....
00007ffe`89a1e5c8  05 14 08 13  ....
00007ffe`89a1e5cc  12

Next, we try to access nextIntervalRef with returned value from genLog2() which is 0xffffff and resulting in AV. I will follow up with C++ team.

@jakobbotsch
Copy link
Member

Nice investigation, sounds like a tricky one. Can we switch to an intrinsic in genLog2 as a workaround (and perf optimization)?

@kunalspathak
Copy link
Member

Can we switch to an intrinsic in genLog2 as a workaround (and perf optimization)?

Thanks for the suggestion, I will try it out.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 13, 2022
@kunalspathak
Copy link
Member

@jkotas @agocke - Seems we don't have CI coverage to test release coreclr on arm64 because of which this bug was not detected earlier. This bug causes failure in release version on a "hello world" scenario. Any thoughts on how to add coverage so we can catch such bugs earlier?

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label May 14, 2022
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone May 14, 2022
@jkotas
Copy link
Member

jkotas commented May 14, 2022

Yes, it has been a long outstanding problem that we are not running tests against bits that we are shipping. If we are not running tests against Windows Release arm64 in the twice a day rolling builds that are expected to have full coverage, it is clearly a problem that we need to fix. cc @jakobbotsch

For PR CI coverage, we are running a subset of architecture/OS/flavor combinations by design. I do not see a problem with Windows arm64 Release missing there.

@hoyosjs
Copy link
Member

hoyosjs commented May 14, 2022

We don't - coreclr seems to be testing only checked in the rolling (just the same as PR) and libraries tests which would also have caught this run

  • release coreclr (windows_x64, OSX_x64, Linux_x64, Linux_musl_x64) + debug on PR and release on CI libs
  • release coreclr + libs windows x86

runtime tests are never run on release then, and release coreclr is never tested then on: linux/musl arm64, linux/musl arm, win-arm64, osx-arm64

note: This is only talking about the main runtime pipeline - I didn't check any of the outer/stress pipelines.

@jakobbotsch
Copy link
Member

jakobbotsch commented May 14, 2022

I'll look into expanding the matrix for rolling runs on Monday.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 17, 2022
@jakobbotsch
Copy link
Member

jakobbotsch commented May 17, 2022

I looked into this a bit more and I can see that we do have the full coverage in runtime-extra-platforms: we run release coreclr + release libraries tests in all our supported configurations there. I can see that our win-arm64 runs in that pipeline were failing with

Process terminated. Infinite recursion during resource lookup within System.Private.CoreLib.  This may be a bug in System.Private.CoreLib, or potentially in certain extensibility points such as assembly resolve events or CultureInfo names.  Resource name: Arg_AccessViolationException
   at System.Environment.FailFast(System.String)
   at System.SR.InternalGetResourceString(System.String)
   at System.SR.GetResourceString(System.String)
   at System.AccessViolationException..ctor()
   at System.SpanHelpers.IndexOf(Char ByRef, Char, Int32)
   at System.String.Ctor(Char*)
   at System.Globalization.CultureData.GetLocaleInfoEx(System.String, UInt32)
   at System.Globalization.CultureInfo.GetUserDefaultLocaleName()
   at System.Globalization.CultureInfo..cctor()
   at System.Globalization.CultureInfo.get_CachedCulturesByName()
   at System.Globalization.CultureInfo.GetCultureInfo(System.String)
   at System.Resources.ManifestBasedResourceGroveler.GetNeutralResourcesLanguage(System.Reflection.Assembly, System.Resources.UltimateResourceFallbackLocation ByRef)
   at System.Resources.ResourceManager.CommonAssemblyInit()
   at System.SR.get_ResourceManager()
   at System.SR.InternalGetResourceString(System.String)
   at System.SR.GetResourceString(System.String)
   at System.AccessViolationException..ctor()
   at System.SpanHelpers.IndexOf(Char ByRef, Char, Int32)
   at System.String.Ctor(Char*)
   at System.AppContext.Setup(Char**, Char**, Int32)

until the latest run that incorporates the fix from #69333. This issue was actually pointed out in #67771 before it was merged, but it looks like it might have been misdiagnosed as another issue.

win-x64 is also failing in runtime-extra-platforms right now which looks to be #67728. The pipeline unfortunately has a lot of failures so keeping tabs on what is known and what isn't is challenging, but we should try to be more aware of this in the future.

@hoyosjs
Copy link
Member

hoyosjs commented May 17, 2022

The problem is extra-platforms is not followed as closely - but it does need to be as you say. Is there any coreclr testing on release at all?

@jkotas
Copy link
Member

jkotas commented May 17, 2022

Yes, we do run a number of legs with release coreclr build.

@hoyosjs
Copy link
Member

hoyosjs commented May 17, 2022

I meant the ones for src/test. Or is that not relevant enough to catch further issues on release?

@jkotas
Copy link
Member

jkotas commented May 17, 2022

For PR CI coverage, we do run libraries tests on release coreclr release that ensure that it is not completely broken. So testing on checked coreclr should be sufficient there.

For the rolling builds, I agree we should be running the runtime tests on release coreclr and libraries. It sounds like something that needs to be fixed.

@DrewScoggins
Copy link
Member

We have started seeing this bug again. It looks like it happened between the commits here, dd919f1...b12e288.

@kunalspathak

@DrewScoggins DrewScoggins reopened this May 20, 2022
@jakobbotsch
Copy link
Member

jakobbotsch commented May 20, 2022

Strangely we have runs from yesterday that are green:
https://dev.azure.com/dnceng/public/_build/results?buildId=1781232&view=results
(Libraries Test Run release coreclr windows arm64 Release)

@kunalspathak
Copy link
Member

@DrewScoggins - I downloaded the artifacts from the failing job and it seems to work fine for me. Is the machine in bad state or something?

@LoopedBard3
Copy link
Member Author

It looks like the most recent run was successful. I will keep watching this today over the next handful of runs to see if we keep seeing this and close this if it is fixed.

@kunalspathak
Copy link
Member

Do you see any more failures or should we go ahead and close it?

@LoopedBard3
Copy link
Member Author

We should go ahead and close this, the errors haven't been showing up and things have been running smoothly. I will go ahead and close it.

@VSadov
Copy link
Member

VSadov commented Jun 20, 2022

The fix needs to be ported to 6.0
I have just hit this bug trying to repro an unrelated 6.0 issue. Once you use a newer compiler when building 6.0 runtime, win-arm64 JIT crashes the same way.

@jakobbotsch
Copy link
Member

Seems reasonable to port it, but I would also hope the VC++ team would prioritize fixing this so that newer compilers will not hit this problem in all old source code we compile. @kunalspathak did you hear back from them?

@kunalspathak
Copy link
Member

@kunalspathak did you hear back from them?

They have already fixed the bug and it is set to "Pending Release".

https://developercommunity.visualstudio.com/t/Bad-codegen-for-Arm64-when-upgraded-to-1/10054888

I will start the process to backport it because I am not sure when the release will be out and we will update VC++ at that time.

@kunalspathak
Copy link
Member

Note, in the issue, they did suggest a workaround of disabling the newer optimization by using /d2pattern-opt-disable:1205678968 that introduced this bug, but I am hesitant of using any such workaround and rather fix the code on our end

@ghost ghost locked as resolved and limited conversation to collaborators Jul 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants