Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGABRT on Apple M1 (rosetta) #45222

Closed
haraldsteinlechner opened this issue Nov 25, 2020 · 17 comments
Closed

SIGABRT on Apple M1 (rosetta) #45222

haraldsteinlechner opened this issue Nov 25, 2020 · 17 comments
Labels
area-VM-coreclr os-macos-bigsur (macOS11) untriaged New issue has not been triaged by the area owner

Comments

@haraldsteinlechner
Copy link

haraldsteinlechner commented Nov 25, 2020

Description

I'm not sure about the "should be" status of dotnet on new apple silicon. anyways, the problem I stumbled across might help in stabilising the support. The problem looks very similar to bullet point two in this issue: #44897

On Mac OS, with x64 emulation I get the following output on console:
RestoreState: 1335: thread_set_state(float) (os/kern) invalid argument

when looking at the dumps it looks as such: https://pastebin.com/tEj95DD4
the relevant part would be:

Thread 1 Crashed:
0   ???                           	0x00007ffe95cce9bc ???
1   libsystem_kernel.dylib        	0x00007fff20390502 __pthread_kill + 10
2   libsystem_c.dylib             	0x00007fff20311720 abort + 120
3   libcoreclr.dylib              	0x0000000108cc1b99 MachExceptionInfo::RestoreState(unsigned int) + 297
4   libcoreclr.dylib              	0x0000000108cc141b SEHExceptionThread(void*) + 571
5   libsystem_pthread.dylib       	0x00007fff203be950 _pthread_start + 224
6   libsystem_pthread.dylib       	0x00007fff203ba47b thread_start + 15

I'm pretty sure the problem is in the runtime for those reasons:

  • it does not happen on other architectures
  • when digging deeper it seems the problem is very similar to: Support .NET on Apple Silicon with Rosetta 2 emulation #44897 (known issue 2)
  • the precondition here:
    This means: no printf, no TRACE, no PAL allocation, no ExitProcess,
    captured my attention in combination with the following statement from the previous issue: "Rosetta 2 emulation crashes with a fatal failure when calling with thread_get_state x86_FLOAT_STATE64. This is because the emulator does not emulate AVX support, but the function should simply return an error."

unfortunately I have no easy to share minimal repro.

Configuration

  • 5.0.100
  • Mac mini M1
  • System Version: macOS 11.0 (20A2411)
  • Kernel Version: Darwin 20.1.0

Regression?

unknown

Other information

related issues:

if there is an easy fix I'm happy to try asap using my test setup.

thanks and cheers,
Harald

@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Nov 25, 2020
@danmoseley
Copy link
Member

@sdmaclea

@danmoseley
Copy link
Member

Oops, I read ARM64 not AMD64

@sdmaclea
Copy link
Contributor

/cc @janvorli Yes this looks like the known issue where Rosetta2 doesn't gracefully fall back AVX during emulation

@haraldsteinlechner haraldsteinlechner changed the title SIGABRT on Apple M1 SIGABRT on Apple M1 (rosetta) Nov 25, 2020
@sdmaclea
Copy link
Contributor

if there is an easy fix I'm happy to try asap using my test setup.

@haraldsteinlechner Testing/building is not trivial, but if you want to try to build and try our current patches the process would look like this.

@haraldsteinlechner
Copy link
Author

haraldsteinlechner commented Nov 26, 2020

thank you for the detailed explanation. the fixes at least for me look promising :) In fact I tried to naively hack out some portions suspect to failure on rosetta but even failed on compiling the x64 runtime on m1 (I'm not sure how naive this is ;)) - most likely due to missing knowledge of mac tooling as such.

you have a different/real setup for cross-compiling right? As I have seen in the other issue, most tests now pass with #45226 Seems like this one was not necessary:janvorli@aee81ac . In that case my test still could be useful. If by any chance there is a build available I would of course come back. the cross compilation seems to hot for now :(

@janvorli
Copy link
Member

@haraldsteinlechner My changes for activation injection via signal is still being tested and there are some issues we are hitting with it. Without this change, you can observe hangs in rare cases when GC needs to suspend a thread. If you don't experience hangs, you don't need to use that one in your experiments.

@sdmaclea
Copy link
Contributor

sdmaclea commented Dec 1, 2020

Two tests deadlocked w/o janvorli@aee81ac
With the change two different tests asserted. Apple is looking at fixing both issues for us see #44897 discussion.

different/real setup for cross-compiling right?

We don't have any cross compile support in the 5.0 branch. The master (6.0) branch can be cross compiled on either macos platform for the other. The trick is to use xcode 12.2 or greater. And add the architecture and /p:CrossBuild=1 to the command line.

@janvorli
Copy link
Member

janvorli commented Dec 1, 2020

/p:CrossBuild=1

--cross option works too.

@haraldsteinlechner
Copy link
Author

haraldsteinlechner commented Jan 24, 2021

now with 11.2 out I checked again and the problem changed - I have another segfault now:

Thread 16 Crashed:
0   ???                           	000000000000000000 0 + 0
1   libsystem_kernel.dylib        	0x00007fff20315f86 swtch_pri + 10
2   libcoreclr.dylib              	0x000000010904f185 PAL_DispatchException + 405
3   libcoreclr.dylib              	0x000000010904ec97 PAL_DispatchExceptionWrapper + 10
4   ???                           	000000000000000000 0 + 0

on: System Version: macOS 11.2 (20D53)
and dotnet --version 5.0.102 (EDIT: also happens with `dotnet --version 5.0.200-preview.21073.2``

is this one familiar or known? @janvorli

EDIT: the full crash dump is here: https://pastebin.com/nxrKLT2h

@janvorli
Copy link
Member

@haraldsteinlechner I might have seen this in the past, but there is too little evidence. Does it crash like this consistently? And does it print anything to the console besides crashing? I've done quite a lot of testing on the macOS 11.2 beta 2 and the only issue I've hit was a very rare assert in Rosetta 2: "assertion failed: GPR thread_set_state is unsupported while in sa_tramp. (ThreadContextRegisterState.cpp:1250 thread_set_state_gpr_64)".

@haraldsteinlechner
Copy link
Author

yes it is 100% reproduceable, it just crashes without console output. there is little chance i can trim it down do a small example. how could i help in finding the problem?

@janvorli
Copy link
Member

From the log you have provided, I can see that your application was running under the .NET Core 3.1.11 runtime:

/usr/local/share/dotnet/shared/Microsoft.NETCore.App/3.1.11/libcoreclr.dylib

Is that expected?

@haraldsteinlechner
Copy link
Author

I managed to switch to net5 app. Unfortunately same behaviour: https://pastebin.com/zR0nvQRp
no output except for segmentation fault...

@haraldsteinlechner
Copy link
Author

haraldsteinlechner commented Jan 27, 2021

ok.. after digging deeper I found a segfault in our code (in the OpenGL implementation we hit an unimplemented feature). Still a mystery though why it looked like the crash dump previously posted.

EDIT: to make this clear I cannot confirm Dotnet related crashes for now! Thanks for the good work also in the other issues. Will ping back.

@janvorli
Copy link
Member

@haraldsteinlechner thank you for the details!

@haraldsteinlechner
Copy link
Author

after heavy testing we could not spot any problems - for me this could be closed. @krauthaufen

@ghost ghost locked as resolved and limited conversation to collaborators Feb 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-VM-coreclr os-macos-bigsur (macOS11) untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

5 participants