Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh #72755

Closed
VincentBu opened this issue Jul 25, 2022 · 18 comments
Closed
Assignees
Labels
Milestone

Comments

@VincentBu
Copy link
Contributor

VincentBu commented Jul 25, 2022

Run: runtime-coreclr gcstress-extra 20220723.1

Failed test:

coreclr Linux x64 Checked gcstress0xc_jitminopts_heapverify1 @ Ubuntu.1804.Amd64.Open

- baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh

coreclr OSX arm64 Checked gcstress0xc_jitminopts_heapverify1 @ OSX.1200.ARM64.Open

- JIT/Methodical/Arrays/lcs/lcs2_d/lcs2_d.cmd

coreclr OSX arm64 Checked gcstress0xc_jitstress1 @ OSX.1200.ARM64.Open

- JIT/Methodical/Arrays/lcs/lcs2_r/lcs2_r.cmd

coreclr OSX arm64 Checked gcstress0xc_zapdisable_heapverify1 @ OSX.1200.ARM64.Open

- JIT/Methodical/Boxing/boxunbox/KeepAliveBoxOpt/KeepAliveBoxOpt.cmd

Error message:

Process terminated. Parallel crash in main thread
at System.Environment.FailFast(System.String)
at ParallelCrash.Main(System.String[])

cmdLine:/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 7/24/2022 12:39:10 AM, end: 7/24/2022 2:09:10 AM)

Return code:      -100
Raw output file:      /datadisks/disk1/work/AD4109E9/w/9F5508E7/uploads/Reports/baseservices.exceptions/simple/ParallelCrash/ParallelCrash.output.txt
Raw output:
BEGIN EXECUTION
/datadisks/disk1/work/AD4109E9/p/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false ParallelCrash.dll ''
[createdump] Gathering state for process 14588 corerun
[createdump] Crashing thread 000038fc signal 00000006
[createdump] Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.14588.dmp
[createdump] Written 219791360 bytes (53660 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 1438ms

cmdLine:/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 7/24/2022 12:39:10 AM, end: 7/24/2022 2:09:10 AM)
Test Harness Exitcode is : -100
To run the test:

set CORE_ROOT=/datadisks/disk1/work/AD4109E9/p
/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh
Expected: True
Actual:   False


Stack trace
   at baseservices_exceptions._simple_ParallelCra_ParallelCra_._simple_ParallelCrash_ParallelCrash_sh()
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
Queued OS Arch Column1
2022-07-24T00:09:19.457Z ubuntu.1804.amd64.open.rt x64 runtime-coreclr gcstress-extra Checked-gcstress0xc_jitminopts_heapverify1
2022-07-17T14:50:14.532Z ubuntu.1804.armarch.open arm64 runtime-coreclr gcstress0x3-gcstress0xc Checked-gcstress0xc
2022-07-10T11:00:50.23Z osx.1200.arm64.open arm64 runtime-coreclr crossgen2-composite gcstress Checked-gcstress0xc_jitminopts_heapverify1
2022-06-26T05:34:58.745Z osx.1200.arm64.open arm64 runtime-coreclr r2r-extra Checked-gcstress0xc
2022-06-19T14:53:47.255Z ubuntu.1804.armarch.open arm64 runtime-coreclr gcstress0x3-gcstress0xc Checked-gcstress0xc
2022-06-18T23:07:33.226Z ubuntu.1804.amd64.open.rt x64 runtime-coreclr gcstress-extra Checked-gcstress0xc_zapdisable_heapverify1
2022-05-29T14:55:06.304Z ubuntu.1804.armarch.open arm runtime-coreclr gcstress0x3-gcstress0xc Checked-gcstress0xc
2022-06-12T08:37:37.57Z osx.1200.arm64.open arm64 runtime-coreclr crossgen2-composite gcstress Checked-gcstress0xc_zapdisable_jitstress2
2022-05-29T00:10:21.895Z osx.1200.arm64.open arm64 runtime-coreclr gcstress-extra Checked-gcstress0xc_jitminopts_heapverify1
@VincentBu VincentBu added arch-arm64 os-linux Linux OS (any supported distro) os-mac-os-x macOS aka OSX GCStress arch-x64 blocking-clean-ci-optional Blocking optional rolling runs labels Jul 25, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jul 25, 2022
@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Jul 30, 2022
@mangod9 mangod9 added this to the 7.0.0 milestone Jul 30, 2022
@VincentBu
Copy link
Contributor Author

Failed again in: runtime-coreclr gcstress-extra 20220731.1

Failed Test:

coreclr Linux arm64 Checked gcstress0xc_jitstress2 @ (Ubuntu.1804.Arm64.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm64v8-20210531091519-97d8652

- baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh

Error message:

Process terminated. Parallel crash in main thread
at System.Environment.FailFast(System.String)
at ParallelCrash.Main(System.String[])
apply_reg_state: ip and cfa unchanged; stopping here (ip=0x7f7b5e5b68)

cmdLine:/root/helix/work/workitem/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 8/1/2022 12:40:15 AM, end: 8/1/2022 2:10:15 AM)

Return code:      -100
Raw output file:      /root/helix/work/workitem/uploads/Reports/baseservices.exceptions/simple/ParallelCrash/ParallelCrash.output.txt
Raw output:
BEGIN EXECUTION
/root/helix/work/correlation/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false ParallelCrash.dll ''
[createdump] Gathering state for process 1758 corerun
[createdump] Crashing thread 000006de signal 00000006
[createdump] Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.1758.dmp
[createdump] Written 219217920 bytes (53520 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 3569ms

cmdLine:/root/helix/work/workitem/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 8/1/2022 12:40:15 AM, end: 8/1/2022 2:10:15 AM)
Test Harness Exitcode is : -100
To run the test:

set CORE_ROOT=/root/helix/work/correlation
/root/helix/work/workitem/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh
Expected: True
Actual:   False


Stack trace
   at baseservices_exceptions._simple_ParallelCra_ParallelCra_._simple_ParallelCrash_ParallelCrash_sh()
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)

@VincentBu
Copy link
Contributor Author

Queued OS Arch Column1
2022-08-01T10:39:41.436Z osx.1200.arm64.open arm64 runtime-coreclr crossgen2-composite gcstress Checked-gcstress0xc_jitminopts_heapverify1
2022-08-01T00:31:13.607Z ubuntu.1804.armarch.open arm64 runtime-coreclr gcstress-extra Checked-gcstress0xc_jitstress2
2022-07-24T00:09:19.457Z ubuntu.1804.amd64.open.rt x64 runtime-coreclr gcstress-extra Checked-gcstress0xc_jitminopts_heapverify1
2022-07-17T14:50:14.532Z ubuntu.1804.armarch.open arm64 runtime-coreclr gcstress0x3-gcstress0xc Checked-gcstress0xc
2022-07-10T11:00:50.23Z osx.1200.arm64.open arm64 runtime-coreclr crossgen2-composite gcstress Checked-gcstress0xc_jitminopts_heapverify1
2022-06-26T05:34:58.745Z osx.1200.arm64.open arm64 runtime-coreclr r2r-extra Checked-gcstress0xc
2022-06-19T14:53:47.255Z ubuntu.1804.armarch.open arm64 runtime-coreclr gcstress0x3-gcstress0xc Checked-gcstress0xc
2022-06-18T23:07:33.226Z ubuntu.1804.amd64.open.rt x64 runtime-coreclr gcstress-extra Checked-gcstress0xc_zapdisable_heapverify1

@AntonLapounov
Copy link
Member

@mikem8361 This test intentionally crashes and then it seems to hang on the close(parent_pipe); line in the PROCCreateCrashDump function. That line was added in #69663. Could you please take a look?

libpthread_2_27+0x11d5c
libcoreclr!PROCCreateCrashDump+0x338 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2460] 
libcoreclr!PROCCreateCrashDumpIfEnabled+0x5d0 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2637] 
libcoreclr!PROCAbort+0x20 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2664] 
libcoreclr!PROCEndProcess+0x304 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 1346] 
libcoreclr!TerminateProcess+0x90 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 1241] 
libcoreclr!SafeExitProcess+0x180
libcoreclr!EEPolicy::HandleFatalError+0x190 [/__w/1/s/src/coreclr/vm/eepolicy.cpp @ 797] 
libcoreclr!SystemNative::GenericFailFast+0x334 [/__w/1/s/src/coreclr/classlibnative/bcltype/system.cpp @ 291] 
libcoreclr!SystemNative::FailFast+0x10c [/__w/1/s/src/coreclr/classlibnative/bcltype/system.cpp @ 310] 

@mikem8361
Copy link
Member

mikem8361 commented Aug 7, 2022 via email

@mikem8361
Copy link
Member

mikem8361 commented Aug 7, 2022

Sorry, I was looking at the PR diff for the line numbers not the current process.cpp source. You are right that line 2460 in the current source is the close(parent_pipe). I need to investigate why close on this pipe hangs.

@AntonLapounov
Copy link
Member

I think you are actually correct, because those line numbers are for the 20220731.1 build. The crashing process should be small as we run this test individually.

@mikem8361
Copy link
Member

If the stack trace is from a core dump generated by createdump, the crashing thread will be waiting at that point on the waitpid.

Isn't the failure from whatever called the FailFast? I don't think this issue is createdump related.

@AntonLapounov
Copy link
Member

This test intentionally calls Environment.FailFast:

Environment.FailFast("Parallel crash in main thread");

The question is why the test hangs indefinitely after that call:

cmdLine:/root/helix/work/workitem/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 8/1/2022 12:40:15 AM, end: 8/1/2022 2:10:15 AM)

#69663 was merged on May 26th. According to the table above, the test started to time out on May 29th (2022-05-29). Could that be related?

@ghost
Copy link

ghost commented Aug 8, 2022

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

Run: runtime-coreclr gcstress-extra 20220723.1

Failed test:

coreclr Linux x64 Checked gcstress0xc_jitminopts_heapverify1 @ Ubuntu.1804.Amd64.Open

- baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh

coreclr OSX arm64 Checked gcstress0xc_jitminopts_heapverify1 @ OSX.1200.ARM64.Open

- JIT/Methodical/Arrays/lcs/lcs2_d/lcs2_d.cmd

coreclr OSX arm64 Checked gcstress0xc_jitstress1 @ OSX.1200.ARM64.Open

- JIT/Methodical/Arrays/lcs/lcs2_r/lcs2_r.cmd

coreclr OSX arm64 Checked gcstress0xc_zapdisable_heapverify1 @ OSX.1200.ARM64.Open

- JIT/Methodical/Boxing/boxunbox/KeepAliveBoxOpt/KeepAliveBoxOpt.cmd

Error message:

Process terminated. Parallel crash in main thread
at System.Environment.FailFast(System.String)
at ParallelCrash.Main(System.String[])

cmdLine:/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 7/24/2022 12:39:10 AM, end: 7/24/2022 2:09:10 AM)

Return code:      -100
Raw output file:      /datadisks/disk1/work/AD4109E9/w/9F5508E7/uploads/Reports/baseservices.exceptions/simple/ParallelCrash/ParallelCrash.output.txt
Raw output:
BEGIN EXECUTION
/datadisks/disk1/work/AD4109E9/p/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false ParallelCrash.dll ''
[createdump] Gathering state for process 14588 corerun
[createdump] Crashing thread 000038fc signal 00000006
[createdump] Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.14588.dmp
[createdump] Written 219791360 bytes (53660 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 1438ms

cmdLine:/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 7/24/2022 12:39:10 AM, end: 7/24/2022 2:09:10 AM)
Test Harness Exitcode is : -100
To run the test:

set CORE_ROOT=/datadisks/disk1/work/AD4109E9/p
/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh
Expected: True
Actual:   False


Stack trace
   at baseservices_exceptions._simple_ParallelCra_ParallelCra_._simple_ParallelCrash_ParallelCrash_sh()
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
Queued OS Arch Column1
2022-07-24T00:09:19.457Z ubuntu.1804.amd64.open.rt x64 runtime-coreclr gcstress-extra Checked-gcstress0xc_jitminopts_heapverify1
2022-07-17T14:50:14.532Z ubuntu.1804.armarch.open arm64 runtime-coreclr gcstress0x3-gcstress0xc Checked-gcstress0xc
2022-07-10T11:00:50.23Z osx.1200.arm64.open arm64 runtime-coreclr crossgen2-composite gcstress Checked-gcstress0xc_jitminopts_heapverify1
2022-06-26T05:34:58.745Z osx.1200.arm64.open arm64 runtime-coreclr r2r-extra Checked-gcstress0xc
2022-06-19T14:53:47.255Z ubuntu.1804.armarch.open arm64 runtime-coreclr gcstress0x3-gcstress0xc Checked-gcstress0xc
2022-06-18T23:07:33.226Z ubuntu.1804.amd64.open.rt x64 runtime-coreclr gcstress-extra Checked-gcstress0xc_zapdisable_heapverify1
2022-05-29T14:55:06.304Z ubuntu.1804.armarch.open arm runtime-coreclr gcstress0x3-gcstress0xc Checked-gcstress0xc
2022-06-12T08:37:37.57Z osx.1200.arm64.open arm64 runtime-coreclr crossgen2-composite gcstress Checked-gcstress0xc_zapdisable_jitstress2
2022-05-29T00:10:21.895Z osx.1200.arm64.open arm64 runtime-coreclr gcstress-extra Checked-gcstress0xc_jitminopts_heapverify1
Author: VincentBu
Assignees: mikem8361
Labels:

arch-arm64, os-linux, os-mac-os-x, GCStress, arch-x64, area-Diagnostics-coreclr, area-ExceptionHandling-coreclr, blocking-clean-ci-optional

Milestone: 7.0.0

@hoyosjs
Copy link
Member

hoyosjs commented Aug 9, 2022

Mike, I see this in an old log:

Process terminated. Parallel crash in main thread
   at System.Environment.FailFast(System.String)
   at ParallelCrash.Main(System.String[])
apply_reg_state: ip and cfa unchanged; stopping here (ip=0x7fa6a5a538)

Looks like it comes from libunwind.

mikem8361 added a commit that referenced this issue Aug 11, 2022
* Add logging callback to DAC EnumMemoryRegion API

Add the ICLRDataEnumMemoryRegionsLoggingCallback interface so createdump can receive logging from the EnumMemoryRegion API.

Add logging to MethodTable EnumMemoryRegion for invalid EEClass/Canonical MT.

* Add checked build logging after waitpid for createdump to help diagnose issue #72755.

* Add more DAC logging

* Code review feedback
 
Co-authored-by: Juan Hoyos <[email protected]>
@mikem8361
Copy link
Member

How can I see more recent failures (with a runtime with my new logging) with this test?

@tommcdon
Copy link
Member

How can I see more recent failures (with a runtime with my new logging) with this test?

Here's a runfo query that should provide the latest failures: https://runfo.azurewebsites.net/search/tests/?q=started%3A%7E7+definition%3A662+name%3AParallelCrash.sh. You can change the timeframe by altering "started" to filter within a specific date range.

@hoyosjs
Copy link
Member

hoyosjs commented Aug 12, 2022

According to kusto, last failure was 2022-08-07T08:53:38.06Z, so nothing since your PR went in.

@AntonLapounov
Copy link
Member

Very likely a duplicate of #57621. Please reopen if you hit this again.

@AntonLapounov
Copy link
Member

Reopening, I did not check dates properly.

@AntonLapounov AntonLapounov reopened this Aug 13, 2022
@hoyosjs
Copy link
Member

hoyosjs commented Aug 15, 2022

This hasn't reproed since 8/7; moving to future until we have something to make it actionable.

@hoyosjs hoyosjs modified the milestones: 7.0.0, Future Aug 15, 2022
@tommcdon
Copy link
Member

ParallelCrash - no failures in the last 3 weeks, closing.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

7 participants