New GC decommit strategy for large/huge regions #109431

markples · 2024-10-31T22:30:45Z

Use age rather than budget for determining whether large and huge regions should be decommitted
- Combine new age logic with previous age logic for basic regions
- Simply add aged regions to decommit list. In the future, this can be combined with move_highest_free_regions with age swapping to optimize which regions are decommitted
- Factor modified portions of distribute_free_regions into helper functions
- Simplify distribute_free_regions when logic now only applies to basic regions
Add logic to more aggressively decommit in high-memory-usage situations
Call distribute_free_regions during BGC (or during ephemeral "pre-GC" to a BGC if one occurs)

dotnet-policy-service · 2024-10-31T22:31:15Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

src/coreclr/gc/gc.cpp

src/coreclr/gc/gcpriv.h

src/coreclr/gc/gc.cpp

Maoni0 · 2024-11-26T00:19:57Z

src/coreclr/gc/gc.cpp

@@ -53336,6 +53465,7 @@ bool gc_heap::compute_memory_settings(bool is_initialization, uint32_t& nhp, uin
    }

    m_high_memory_load_th = min ((high_memory_load_th + 5), v_high_memory_load_th);
+    almost_high_memory_load_th = max((high_memory_load_th - 5), 1u);



since this can be set by a config it can be smaller than 5 so this can't be an unsigned number.

good catch, thanks - note also that ~25 lines up there was a possibility for overflow

Maoni0

Mark and I talked about the following couple of changes that he'll do but in general this looks good and we've gone through the perf results -

make sure we are doing the right check for when we want to decommit (in distribute_surplus_p) called at the beginning of a BGC mark phase
add a comment that talks about the policies and operations on the global and local lists (I've written an example for him)
the calc of almost_high_memory_load_th in the comment above
a couple of nit changes for naming

markples · 2024-11-28T01:22:13Z

Done but (2)

make sure we are doing the right check for when we want to decommit (in distribute_surplus_p) called at the beginning of a BGC mark phase

I believe this is correct. Previously we never called distribute_free_regions at this point, but we could call it during an ephemeral gc that is triggered at the start of BGC. In that case, we would choose to decommit a positive balance (background_running_p() is false). We want BGC behavior to match this possible ephemeral GC (as the ephemeral GC is optional (a policy choice) but we don't want distribute_free_regions behaving differently based on this decision). We have distribute_surplus_p return true if (settings.condemned_generation != max_generation), which means it returns false if condemned generation is 2 (BGC, not a foreground GC during BGC). Returning false means we don't distribute but instead decommit, which is what we want.

This will go in the comments.

add a comment that talks about the policies and operations on the global and local lists (I've written an example for him)

the calc of almost_high_memory_load_th in the comment above

fixed this and a nearby similar problem

a couple of nit changes for naming

done

markples · 2024-12-02T22:52:28Z

Comment added for (2)

markples · 2024-12-02T23:55:30Z

I manually built a new commit with all of these changes as rebase operations to squash some commits together were retriggering old merge conflicts. To see the most recent changes separately, see my backup branch https://github.com/markples/runtime/tree/new-dfr-backup or the specific commits:

fix underflow possibility and nearby overflow one
decide_on_decommit_strategy
dst -> dest to match majority of file
Rename remove_surplus_regions->trim_region_list and add_regions->grow_region_list.
dfr comment

or those 5 squashed in https://github.com/markples/runtime/tree/new-dfr-backup-squash at 3eef23b

markples · 2024-12-10T22:22:31Z

src/coreclr/gc/gc.cpp

+// - aged_regions
+// - surplus_regions
+//
+// For reason_induced_aggressive GCs, we decommit all regions.  Therefore, the below description is


all region

that we can

markples · 2024-12-10T22:29:26Z

src/coreclr/gc/gc.cpp

+//    a. A negative balance (deficit) for SOH (basic) will be distributed it means we expect to use
+//       more memory than we have on the free lists.  A negative balance for LOH (large) isn't possible
+//       for LOH since the budgets start at zero.
+//    b. For SOH (basic), we will decommit surplus regions unless we are in a foreground GC during BGC.


this matches old policy. if not decommit then we distribute evenly. "foreground GC" is enough (means during BGC)

src/coreclr/gc/gc.cpp

markples added the area-GC-coreclr label Oct 31, 2024

markples added this to the 10.0.0 milestone Oct 31, 2024

markples self-assigned this Oct 31, 2024

markples commented Oct 31, 2024

View reviewed changes

src/coreclr/gc/gc.cpp Show resolved Hide resolved

markples commented Oct 31, 2024

View reviewed changes

src/coreclr/gc/gc.cpp Show resolved Hide resolved

markples commented Oct 31, 2024

View reviewed changes

src/coreclr/gc/gc.cpp Show resolved Hide resolved

build-analysis bot mentioned this pull request Nov 1, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

markples commented Nov 5, 2024

View reviewed changes

src/coreclr/gc/gc.cpp Outdated Show resolved Hide resolved

markples commented Nov 5, 2024

View reviewed changes

src/coreclr/gc/gcpriv.h Outdated Show resolved Hide resolved

build-analysis bot mentioned this pull request Nov 5, 2024

LibraryImportGenerator.Unit.Tests crashing on linux-x64 mono interpreter #100800

Open

markples commented Nov 5, 2024

View reviewed changes

src/coreclr/gc/gc.cpp Show resolved Hide resolved

markples commented Nov 5, 2024

View reviewed changes

src/coreclr/gc/gc.cpp Show resolved Hide resolved

Maoni0 reviewed Nov 13, 2024

View reviewed changes

src/coreclr/gc/gc.cpp Outdated Show resolved Hide resolved

markples changed the title ~~[draft] New decommit strategy for large/huge regions~~ New GC decommit strategy for large/huge regions Nov 19, 2024

markples marked this pull request as ready for review November 19, 2024 10:54

Maoni0 reviewed Nov 25, 2024

View reviewed changes

src/coreclr/gc/gc.cpp Show resolved Hide resolved

Maoni0 reviewed Nov 26, 2024

View reviewed changes

Maoni0 approved these changes Nov 26, 2024

View reviewed changes

markples added 2 commits December 2, 2024 15:36

new-dfr in one commit

d31a4e7

Merge remote-tracking branch 'dotnet/main' into new-dfr

d3cfeb8

markples force-pushed the new-dfr branch from c23ce68 to d3cfeb8 Compare December 2, 2024 23:44

build-analysis bot mentioned this pull request Dec 3, 2024

System.Formats.Nrbf.Tests timeouts #110285

Closed

mangod9 approved these changes Dec 5, 2024

View reviewed changes

fixes for decommitting large+huge and comments

25ef197

markples force-pushed the new-dfr branch from fedfeca to 25ef197 Compare December 7, 2024 00:36

build-analysis bot mentioned this pull request Dec 7, 2024

iOS tests failing with WORKLOAD TIMED OUT - Killing user command. #108103

Open

This was referenced Dec 7, 2024

iOS test fails with "App is not signed" #110395

Closed

[OSX]: AMDeviceSecureInstallApplicationBundle returned: 0xe800801c #110403

Open

markples commented Dec 10, 2024

View reviewed changes

markples mentioned this pull request Jan 8, 2025

Process memory exhaustion under region-based GC heap mode #103582

Open

only age regions during BGC

ea531f3

Maoni0 reviewed Feb 5, 2025

View reviewed changes

src/coreclr/gc/gc.cpp Show resolved Hide resolved

fix age_all_region_kinds condition.

f67f97f

build-analysis bot mentioned this pull request Feb 6, 2025

The hosted runner encountered an error while running your job. (Error Type: Disconnect). dotnet/dnceng#1919

Open

3 tasks

Merge branch 'main' into new-dfr

31c5ba2

mangod9 merged commit c9cc904 into dotnet:main Feb 7, 2025
94 checks passed

LoopedBard3 mentioned this pull request Feb 11, 2025

[Perf] Linux/x64: System.IO.Tests.Perf_File Regression on 2/7/2025 2:13:18 AM +00:00 #112434

Open

AndyAyersMS mentioned this pull request Feb 13, 2025

[Perf] Linux/arm64: 2 Improvements on 2/7/2025 8:52:14 AM +00:00 dotnet/perf-autofiling-issues#50165

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New GC decommit strategy for large/huge regions #109431

New GC decommit strategy for large/huge regions #109431

markples commented Oct 31, 2024 •

edited

Loading

dotnet-policy-service bot commented Oct 31, 2024

Maoni0 Nov 26, 2024

markples Nov 28, 2024

Maoni0 left a comment •

edited

Loading

markples commented Nov 28, 2024 •

edited

Loading

markples commented Dec 2, 2024 •

edited

Loading

markples commented Dec 2, 2024 •

edited

Loading

markples Dec 10, 2024

markples Dec 10, 2024

New GC decommit strategy for large/huge regions #109431

New GC decommit strategy for large/huge regions #109431

Conversation

markples commented Oct 31, 2024 • edited Loading

dotnet-policy-service bot commented Oct 31, 2024

Maoni0 Nov 26, 2024

Choose a reason for hiding this comment

markples Nov 28, 2024

Choose a reason for hiding this comment

Maoni0 left a comment • edited Loading

Choose a reason for hiding this comment

markples commented Nov 28, 2024 • edited Loading

markples commented Dec 2, 2024 • edited Loading

markples commented Dec 2, 2024 • edited Loading

markples Dec 10, 2024

Choose a reason for hiding this comment

markples Dec 10, 2024

Choose a reason for hiding this comment

markples commented Oct 31, 2024 •

edited

Loading

Maoni0 left a comment •

edited

Loading

markples commented Nov 28, 2024 •

edited

Loading

markples commented Dec 2, 2024 •

edited

Loading

markples commented Dec 2, 2024 •

edited

Loading