You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm fairly new to Metal and GPU coding in general, so I hope this issue isn't only due to my lack of knowledge.
On my MacBook Air M1: Metal.versioninfo()
macOS 15.2.0, Darwin 24.2.0
Toolchain:
Julia: 1.11.2
LLVM: 16.0.6
Julia packages:
Metal.jl: 1.5.1
GPUArrays: 11.2.1
GPUCompiler: 1.1.0
KernelAbstractions: 0.9.31
ObjectiveC: 3.2.0
LLVM: 9.1.3
LLVMDowngrader_jll: 0.6.0+0
1 device:
Apple M1 (1.501 GiB allocated)
If I run:
using Metal
N=2^13
X=rand(Float32,N,N)
mtl_X=MtlArray(X);
u=sum(exp.(mtl_X));
I see an increase in the used memory from the activity monitor,
memory which is not freed until I close Julia.
(Image: Activity monitor resulting from running the line `u=sum(exp.(mtl_X));` multiple times)
If then I run:
n=30
for _ in 1:n Metal.@sync u=sum(exp.(mtl_X)) end
the program fills the RAM and my laptop freezes.
Note that if I run instead:
mtl_A=similar(mtl_X)
n=300
for _ in 1:n Metal.@sync begin
mtl_A.=exp.(mtl_X)
u=sum(mtl_A)
end
end
the code runs just fine without any freezing. I thus suspect that u=sum(exp.(mtl_X)) is allocating an MtlMatrix for exp.(mtl_X), which the garbage collection is unable to free. Is this standard behaviour?
Shouldn't the garbage collector be able to free the memory that he allocates?
The text was updated successfully, but these errors were encountered:
(Image: Activity monitor resulting from running the line u=sum(exp.(mtl_X)); multiple times)
Running GC.gc(true) a single time after that frees up all that memory. Garbage collection being delayed, that is kind-of expected behavior.
If then I run:
n=30
for _ in 1:n Metal.@sync u=sum(exp.(mtl_X)) end
the program fills the RAM and my laptop freezes.
I can confirm this makes the host device less responsive, but AFAICT the Julia GC still works properly. Calling GC.gc(true) afterwards, or simply interrupting the loop, makes memory usage drop down here. So if anything, this looks like macOS really doesn't like to be running close against the memory limit. Maybe it never returns an OOM error code (which we rely on to forcibly call the GC when running out of GPU memory), instead opting to page out memory which is really slow.
There's several possible solutions here. We could port CUDA.jl's early GC invocation heuristics based on GPU memory usage, JuliaGPU/CUDA.jl#2304. Or we could try and switch entirely to Julia's memory allocator such that memory pressure from MtlArray is sensed by the GC causing it to run earlier (assuming Julia itself does a better job here).
maleadt
changed the title
Garbage collection doesn't trigger on MtlArrays
Improve memory pressure detection
Feb 20, 2025
Hello,
I'm fairly new to Metal and GPU coding in general, so I hope this issue isn't only due to my lack of knowledge.
On my MacBook Air M1:
Metal.versioninfo()
If I run:
I see an increase in the used memory from the activity monitor,
memory which is not freed until I close Julia.
If then I run:
the program fills the RAM and my laptop freezes.
Note that if I run instead:
the code runs just fine without any freezing. I thus suspect that
u=sum(exp.(mtl_X))
is allocating an MtlMatrix for exp.(mtl_X), which the garbage collection is unable to free. Is this standard behaviour?Shouldn't the garbage collector be able to free the memory that he allocates?
The text was updated successfully, but these errors were encountered: