Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreclaimed ArrayPool<byte>.Shared memory after processing heavy GIF animations #2889

Open
4 tasks done
hexawyz opened this issue Feb 18, 2025 · 11 comments
Open
4 tasks done

Comments

@hexawyz
Copy link

hexawyz commented Feb 18, 2025

Prerequisites

  • I have written a descriptive issue title
  • I have verified that I am running the latest version of ImageSharp
  • I have verified if the problem exist in both DEBUG and RELEASE mode
  • I have searched open and closed issues to ensure it has not already been reported

ImageSharp version

3.1.6

Other ImageSharp packages and versions

None

Environment (Operating system, version and so on)

Windows 11 23H2

.NET Framework version

9.0.200

Description

When processing GIF animations that have generally relatively "small" dimensions (e.g. about 500x500 maximum, often less), mainly for cropping and resizing, I noticed a significant increase in memory consumption after these operations. This was using the default MemoryAllocator.
Memory was noot freed by calling ReleaseRetainedResources.

From investigations, this seems to come from the fact that the UniformUnmanagedMemoryPoolMemoryAllocator will defer to the standard ArrayPool for all buffers below 1MB, and sadly, these arrays will never be reclaimed:

var buffer = new SharedArrayPoolBuffer<T>((int)totalLengthInElements);

(I also just found this issue which seems to mention a similar problem: #2139)

In my case, because the dimensions of images are relatively small, it turns out that frames will almost always fit in less than 1MB, but there can be many frames.
A workaround is to use the SimpleGcMemoryAllocator, but it comes with other problems and will also likely introduce fragmentation in the LOH.

As an easy improvement, it seems like it would be easy to allow customizing the sharedArrayPoolThresholdInBytes parameter, which is always defaulted to 1MB.

Steps to Reproduce

  1. Download a big animated GIF from Giphy or similar
  2. Crop/Resize the image to 320x320
  3. Observe the state of the heap

Images

No response

@JimBobSquarePants
Copy link
Member

@antonfirsov tagging you here as you're the expert.

The best solution IMO would be either:

  1. Make UniformUnmanagedMemoryPoolMemoryAllocator public and allow setting the threshold for the ArrayPool.Shared instance to be set to zero.
  2. Scrap using ArrayPool.Shared completely.

@antonfirsov
Copy link
Member

@hexawyz could you help us by sharing a minmal runnable repro with a specific gif, just in case?

and sadly, these arrays will never be reclaimed

Reclaim should happen at buffer/image disposal. If the user code disposes things correctly and the library is still leaking arraypool arrays, it's a bug.

@JimBobSquarePants
Copy link
Member

JimBobSquarePants commented Feb 18, 2025

I think what they mean is that the ArrayPool.Shared pool never depletes.

@antonfirsov
Copy link
Member

antonfirsov commented Feb 18, 2025

Sorry for the misunderstanding!

There is a trimming mechanism in the ArrayPool.Shared implementation (triggered by Gen2 GC runs) that should trim away pools and the agressiveness of the trimming depends on the memory pressure. Therefore, if there is memory traffic in the app triggering GC, the memory footprint of the pool should eventually go down after the Gif operations finish, meaning that ideally this should not lead to problems in applications.

We can expose knobs to control pool tresholds on MemoryAllocatorOptions, but I'm conservative about turning implementation details into public options, so if ArrayPool's built-in trimming does a good-enough job to avoid real-life problems, I would prefer not to do it.

@hexawyz does this make sense to you? Do you experience issues beyond just seeing an overly large memory retention?

@hexawyz
Copy link
Author

hexawyz commented Feb 18, 2025

My issues are actually the large retention of memory 😅

Therefore, if there is memory traffic in the app triggering GC

Yeah, but that's exactly my problem here, this is not a server app 🙂
(One of the goals of this app is explicitly to keep memory usage reasonably low)

From my undestanding, ArrayPool.Shared will never efficitenly release the memory unless the whole system is under some kind of memory pressure. And as it also won't trigger unless a GC Gen2 occurs in the process at some point, it could take a very long time for a pogram that would be otherwise relatively conservative about memory allocations.
According to the implementation that was done here (recent here), I'm assuming it would take upwards of 100 iterations of Gen2 GC to trim 100+ unused buffers. So, we're speaking about potentially dozens of minutes, if not hours for cleaning up after an operation that lasted seconds at most.

I was doing some tests again a few minutes ago, and here's what I got:

In this case, I used a 400x400 image that is about 3MB compressed and applied my transformations. (this GIF and the transformation code is here in case you want to have a look)
Processing using the default pool allocated hundreds of megabytes of pooled arrays (113MB to be exact).
The total heap allocated memory rose to 117MB of .NET heap (mostly locked in), compared to 52MB at the observable peak for SimpleGcMemoryAllocator. (In both cases, the OS memory usage for the process stayed higher than the respective amount)

Moreover, in the previous days when I did some non-memory-related tests, transforming multiple images in succession led to the memory growing higher and higher. For example at one point, I rose to more than 700MB of RAM usage before stopping the app.
I'm not entirely sure what is causing that, as it could be a complex combination of the bucketing behaviors in the SharedArrayPool and requesting different buffer sizes at different times.
But what's sure is that doing transformations on GIF files only make the problem worse, as the multiple frames need to be kept in memory. I'm assuming that all of this is in why I saw such a dramatic increase in memory usage using the default allocator.

There's also another thing to consider. I don't know what is the reasonable limit for requesting an array from the shared pool, but I feel like 1MB is a huge amount to request of a pool that is likely to be used for mostly smallish things. It also seems unwise to request more than a handful of shared buffers per operation, which is a very specific problem of GIF images here.
Granted, the current implementation of the memory allocator may not be a problem for server apps, but for desktop apps, it works in opposition to GC Workstation which tries to aggressively relinquish memory to the OS. (And actually does a good job at it)

So, I'm actually honestly wondering if there is really any value in using the array pool at all.
(BTW I'm genuinly interested to know what was the reason for the usage of ArrayPool here in the first place if you have details)

At least, I believe it would be helful to be able to opt out of the usage of the shared ArrayPool. Another benefit of that would be to avoid fragmenting the GC heap(s) and relying on the LOH too much.
But TBH, I'm now tempted to try my luck in implementing a unpooled native memory allocator…

@antonfirsov
Copy link
Member

antonfirsov commented Feb 19, 2025

ArrayPool lowers the GC (or in general, allocation/deallocation) pressure and somewhat improves data locality which shows up on throughput benchmarks. I would need to look up and re-run benchmarks to be more specific.

One of the goals of this app is explicitly to keep memory usage reasonably low

That's a reasonable goal for a desktop app.

At least, I believe it would be helful to be able to opt out of the usage of the shared ArrayPool

Since your buffers are small enough to not hit the unmanaged pool treshold, the behavior would be basically equivalent to what you get with SimpleGCMemoryAllocator.

tempted to try my luck in implementing a unpooled native memory allocator

It's relatively simple, I may put up a gist tomorrow.

Would it make sense to benchmark the default pooling allocator against SimpleGCMemoryAllocator and an unmanaged one for scenarios you care? (Then based on the data we could see if we can add something to the lib to help fellow desktop devs.)

doing transformations on GIF files only make the problem worse, as the multiple frames need to be kept in memory

Gifs are stressing any memory management mechanism to the extreme. I'm getting ideas like a layer for multi-frame images that slices down bigger contigous buffers to smaller ones whenever beneficial (for very large images we do the other way around -- composing them out of smaller 4MB buffers). Or even better, some facility that allows frame-by-frame stream-processing of gifs. But these ideas lead too far and beyond the team's current capacity to build things :(

@hexawyz
Copy link
Author

hexawyz commented Feb 19, 2025

So, I actually tried to implement a full native unpooled memory allocator, and it seemed to work quite well (maybe I've got a slight performance degradation because of this ? Would be hard to tell, honestly)
I honestly spent more time trying to find that I forgot to .Dispose() the image I opened when I tweaked the code yesterday, which would lead the buffers to not be finalized 😑 (Need to dig about this a bit. I added the finalizer just in case, but I'm a bit afraid of CA2015 now.)
But anyway, after solving this problem, it worked like a charm. I decided to fallback on the default allocator for buffers up to 256 bytes, which seemed to be somewhat common but not in high numbers, unlike image buffers.

ArrayPool lowers the GC (or in general, allocation/deallocation) pressure and somewhat improves data locality which shows up on throughput benchmarks. I would need to look up and re-run benchmarks to be more specific.

Hmm good point about data locality, I didn't think about that. I'm wondering though if in that case, it still wouldn't be worth it to maintain a specialized ImageSharp-only pool of buffers, with maybe some kind of hard limit on the number of pooled items.

Would it make sense to benchmark the default pooling allocator against SimpleGCMemoryAllocator and an unmanaged one for scenarios you care? (Then based on the data we could see if we can add something to the lib to help fellow desktop devs.)

I think I'll take low memory usage over ultimate performance for now 😂

Since your buffers are small enough to not hit the unmanaged pool treshold, the behavior would be basically equivalent to what you get with SimpleGCMemoryAllocator.

Hmm I was thinking about directly falling back to unmanaged memory allocations rather than managed arrays, but yeah d

@hexawyz
Copy link
Author

hexawyz commented Feb 19, 2025

Hmm wait no, it seems I actually never .Disposed() the Image 🙁
But apparently, Image and ImageFrame don't have finalizers, and neither do the pooled buffers.

So I had in fact two problems. First was indeed that I had too many allocations in the ArrayPool for a single Image.
Second was that the memory allocated by Image instances was never reclaimed as Dispose() was not called on the image. (Explaining why it rose image after image initially. TBH I didn't even bother looking at the multiple images case afterwards)
The only MemoryAllocator not suffering from this is SimpleGcMemoryAllocator, as it obviously allows the GC to see through everything.

Might be worth considering re-adding finalizers somewhere ?
Although we are obviously supposed to Dispose everything, seems like an easy way to shoot oneself in the foot

@antonfirsov
Copy link
Member

antonfirsov commented Feb 19, 2025

Might be worth considering re-adding finalizers somewhere

There is a finalizer in the lowest-level thingie that is responsible for managing the buffer ownership which is our equivalent for SafeHandle but without an actual handle value inside. With that, there is no need for finalizers at higher levels, just like you don't need to implement finalizers for types that own a SafeHandle. In fact the recommended guideline these days is to avoid finalizers whenever possible.

Hmm wait no, it seems I actually never .Disposed() the Image

Pity I forgot to mention it earlier, but we have a diagnostic utility class MemoryDiagnostics to help catching this.

So, I actually tried to implement a full native unpooled memory allocator

Your code seems to be correct at first glance, but if you want to be 100% on the safe side, consider delegating the finalization and the refcounting concerns to SafeHandle. See the official Memory<T> guidelines for a code sample. That would also get rid of CA2015.

I decided to fallback on the default allocator for buffers up to 256 bytes

IMO 256 bytes is too low which might hurt throughput. You can definitely go higher, likely into the ranges of tens of KBs. Only benchmarks can tell the exact number.

@antonfirsov
Copy link
Member

antonfirsov commented Feb 19, 2025

You likely have hit this actually:

// If this is called by a finalizer, we will end storing the first array of this bucket
// on the thread local storage of the finalizer thread.
// This is not ideal, but subsequent leaks will end up returning arrays to per-cpu buckets,
// meaning likely a different bucket than it was rented from,
// but this is PROBABLY better than not returning the arrays at all.
ArrayPool<byte>.Shared.Return(this.array!);
this.array = null;

@hexawyz
Copy link
Author

hexawyz commented Feb 19, 2025

There is a finalizer in the lowest-level thingie that is responsible for managing the buffer ownership which is our equivalent for SafeHandle but without an actual handle value inside.

Oh yeah, thanks, I clearly missed that 😄
So the missing Dispose wouldn't have been the reason for everything 😮‍💨

Pity I forgot to mention it earlier, but we have a diagnostic utility class MemoryDiagnostics to help catching this.

Well, I found the hard way by implementing a similar thing in my native allocator. Was also useful to identify what were the different buffer sizes that were requested.

See the official Memory guidelines for a code sample. That would also get rid of CA2015.

Hmm yeah, I may trade the "useful" finalizer for a debug-logging finalizer. This might be good enough.

IMO 256 bytes is too low which might hurt throughput. You can definitely go higher, likely into the ranges of tens of KBs. Only benchmarks can tell the exact number.

My worry is still with the factual reusability of such buffers. There should be potential for reuse of buffers of up to 1kB or possibly 4kB, but I think as the buffer become larger, the overhead of the MemoryManager becomes less significant anyway. (And throughput is not that important to me currently)
When I observed the allocations that were taking place, I was surprised to see quite a few, very short-lived allocations, such as 4 bytes, 255 bytes, etc. (IIRC that was when the image was being loaded) That's why I settled on 256 here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants