Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve memory pressure detection #524

Open
Jromano1997 opened this issue Jan 23, 2025 · 1 comment
Open

Improve memory pressure detection #524

Jromano1997 opened this issue Jan 23, 2025 · 1 comment
Labels
arrays Things about the array abstraction. help wanted Extra attention is needed

Comments

@Jromano1997
Copy link

Jromano1997 commented Jan 23, 2025

Hello,

I'm fairly new to Metal and GPU coding in general, so I hope this issue isn't only due to my lack of knowledge.

On my MacBook Air M1:
Metal.versioninfo()

macOS 15.2.0, Darwin 24.2.0

Toolchain:

  • Julia: 1.11.2
  • LLVM: 16.0.6

Julia packages:

  • Metal.jl: 1.5.1
  • GPUArrays: 11.2.1
  • GPUCompiler: 1.1.0
  • KernelAbstractions: 0.9.31
  • ObjectiveC: 3.2.0
  • LLVM: 9.1.3
  • LLVMDowngrader_jll: 0.6.0+0

1 device:

  • Apple M1 (1.501 GiB allocated)

If I run:

using Metal
N=2^13
X=rand(Float32,N,N)
mtl_X=MtlArray(X);

u=sum(exp.(mtl_X));

I see an increase in the used memory from the activity monitor,
memory which is not freed until I close Julia.

Image (Image: Activity monitor resulting from running the line `u=sum(exp.(mtl_X));` multiple times)

If then I run:

n=30
for _ in 1:n Metal.@sync u=sum(exp.(mtl_X)) end

the program fills the RAM and my laptop freezes.

Image

Note that if I run instead:

mtl_A=similar(mtl_X)

n=300
for _ in 1:n Metal.@sync begin
      mtl_A.=exp.(mtl_X)
      u=sum(mtl_A) 
      end
end

the code runs just fine without any freezing. I thus suspect that u=sum(exp.(mtl_X)) is allocating an MtlMatrix for exp.(mtl_X), which the garbage collection is unable to free. Is this standard behaviour?
Shouldn't the garbage collector be able to free the memory that he allocates?

@maleadt
Copy link
Member

maleadt commented Feb 20, 2025

Image (Image: Activity monitor resulting from running the line u=sum(exp.(mtl_X)); multiple times)

Running GC.gc(true) a single time after that frees up all that memory. Garbage collection being delayed, that is kind-of expected behavior.

If then I run:

n=30
for _ in 1:n Metal.@sync u=sum(exp.(mtl_X)) end

the program fills the RAM and my laptop freezes.

I can confirm this makes the host device less responsive, but AFAICT the Julia GC still works properly. Calling GC.gc(true) afterwards, or simply interrupting the loop, makes memory usage drop down here. So if anything, this looks like macOS really doesn't like to be running close against the memory limit. Maybe it never returns an OOM error code (which we rely on to forcibly call the GC when running out of GPU memory), instead opting to page out memory which is really slow.

There's several possible solutions here. We could port CUDA.jl's early GC invocation heuristics based on GPU memory usage, JuliaGPU/CUDA.jl#2304. Or we could try and switch entirely to Julia's memory allocator such that memory pressure from MtlArray is sensed by the GC causing it to run earlier (assuming Julia itself does a better job here).

@maleadt maleadt changed the title Garbage collection doesn't trigger on MtlArrays Improve memory pressure detection Feb 20, 2025
@maleadt maleadt added arrays Things about the array abstraction. help wanted Extra attention is needed labels Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays Things about the array abstraction. help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants