Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dlopen hijacking ignores rpath #4001

Closed
vchuravy opened this issue Aug 3, 2019 · 10 comments · Fixed by #4037
Closed

dlopen hijacking ignores rpath #4001

vchuravy opened this issue Aug 3, 2019 · 10 comments · Fixed by #4037
Assignees
Labels

Comments

@vchuravy
Copy link

vchuravy commented Aug 3, 2019

We are using CUDA-aware OpenMPI and run into the following failure scenario:

julia -e 'ccall((:MPI_Init, :libmpi), Nothing, (Ptr{Cint},Ptr{Cint}), C_NULL, C_NULL); expm1(1.0)'
ERROR: could not load library "libopenlibm"
libopenlibm.so: cannot open shared object file: No such file or directory

Looking at the output of LD_DEBUG=all.
When loading MPI:

     56617:    file=libopenlibm.so [0];  dynamically loaded by /central/software/ucx/1.5.1_cuda-10.0/lib/libucm.so.0 [0]
     56617:    find library=libopenlibm.so [0]; searching
     56617:     search path=/central/software/CUDA/10.0/lib64        (LD_LIBRARY_PATH)
     56617:      trying file=/central/software/CUDA/10.0/lib64/libopenlibm.so
     56617:     search path=/central/software/julia/1.1.0/bin/../lib        (RPATH from file julia)
     56617:      trying file=/central/software/julia/1.1.0/bin/../lib/libopenlibm.so
     56617:     search path=/software/julia/1.1.0//lib:/central/software/OpenMPI/4.0.1_cuda-10.0//lib:/central/software/CUDA/10.0/lib64        (LD_LIBRARY_PATH)
     56617:      trying file=/software/julia/1.1.0//lib/libopenlibm.so
     56617:      trying file=/central/software/OpenMPI/4.0.1_cuda-10.0//lib/libopenlibm.so
     56617:      trying file=/central/software/CUDA/10.0/lib64/libopenlibm.so
     56617:     search cache=/etc/ld.so.cache
     56617:     search path=/lib64/tls:/lib64:/usr/lib64/tls:/usr/lib64        (system search path)
     56617:      trying file=/lib64/tls/libopenlibm.so
     56617:      trying file=/lib64/libopenlibm.so
     56617:      trying file=/usr/lib64/tls/libopenlibm.so
     56617:      trying file=/usr/lib64/libopenlibm.so

Without loading MPI:

     58221:    file=libopenlibm.so [0];  dynamically loaded by /central/software/julia/1.1.0/bin/../lib/libjulia.so.1 [0]
     58221:    find library=libopenlibm.so [0]; searching
     58221:     search path=/central/software/julia/1.1.0/bin/../lib/julia:/central/software/julia/1.1.0/bin/../lib        (RPATH from file /central/software/julia/1.1.0/bin/../lib/libjulia.so.1)
     58221:      trying file=/central/software/julia/1.1.0/bin/../lib/julia/libopenlibm.so
     58221:    
     58221:    file=libopenlibm.so [0];  generating link map
     58221:      dynamic: 0x00002aaad615fd80  base: 0x00002aaad5f32000   size: 0x000000000022f2d0
     58221:        entry: 0x00002aaad5f37290  phdr: 0x00002aaad5f32040  phnum:   

The loader is attributing the dlopen to libucm.so.0 instead of libjulia.so.1.
The RPATH of libjulia.so.1 is $ORIGIN:$ORIGIN/julia, the second of which is the installation location of libopenlibm. Normally the dlopen is done through libjulia and the RPATH is correctly picked up,

We can fix this locally by disabling the memory hooks (as SLURM does) https://github.com/SchedMD/slurm/blob/5fe040f0cca02c8dc92e733e7b10d0067a9fed8a/src/plugins/mpi/pmix/pmixp_dconn_ucx.c#L151-L162

Why does UCX rewrite dlopen? That is incredibly invasive.

cc: @simonbyrne

@shamisp
Copy link
Contributor

shamisp commented Aug 3, 2019

@yosefe seems like ucm is breaking rpath

@yosefe
Copy link
Contributor

yosefe commented Aug 3, 2019

when using cuda, it's requred to override dlopen() to hook cudaMalloc/Free etc. and cache the correct memory type (host/device) for all future loaded libraries
probably need to make dlopen aware of the rpath pf the original binary

@Keno
Copy link

Keno commented Aug 3, 2019

So the problem is that the CUDA libraries don't have an appropriate interface? With the recent NVIDIA acquisition of Mellanox, isn't there a way to work out a better interface here than trying to re-implement a dynamic linker in a message passing library?

@vchuravy
Copy link
Author

vchuravy commented Aug 3, 2019

I was wondering if one couldn't use (and cache) the result of cudaPointerGetAttributes

@bureddy
Copy link
Contributor

bureddy commented Aug 4, 2019

@vchuravy we use cudaPointerGetAttributes when we disable pointer cache. we are using cache for better performance because of cudaPointerGetAttributes can have 0.2-05 us of overhead.

@vchuravy
Copy link
Author

vchuravy commented Aug 4, 2019

Sure that make sense, but you could cache the results of cudaPointerGetAttributes instead of hijacking mmap and malloc.

@bureddy
Copy link
Contributor

bureddy commented Aug 4, 2019

@vchuravy It may not be reliable. if cudaFree() happens and same virtual address could be different memory type.

@shamisp
Copy link
Contributor

shamisp commented Aug 5, 2019

@bureddy I think the check in Cuda also was very expensive (system call?), this is not something that we can do in communication path.

@simonbyrne
Copy link

Is there a way to hook into the cache, or another way for programs to provide the device/host info themselves?

@shamisp
Copy link
Contributor

shamisp commented Aug 6, 2019

@simonbyrne Even if we introduce some sort of interface that will let us to indicate what memory type is used, there is no way to pass this through MPI interface. Let's assume for a second that you use UCX directly. The allocation of memory can happen in 3rd party library (that the language has no control) and you may not know origins of the memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants