Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding Elemental's Performance #72

Open
JBlaschke opened this issue Oct 3, 2021 · 7 comments
Open

Understanding Elemental's Performance #72

JBlaschke opened this issue Oct 3, 2021 · 7 comments

Comments

@JBlaschke
Copy link
Member

Hi,

I am trying to understand the performance of this program at NERSC-- it is basically the same as the example in the README.md, except that I addprocs currently doesn't work, so I am using this (manual) approach of running the MPIClusterManager using start_main_loop, and stop_main_loop

N = parse(Int64, ARGS[1])

# to import MPIManager
using MPIClusterManagers

# need to also import Distributed to use addprocs()
using Distributed

# Manage MPIManager manually -- all MPI ranks do the same work
# Start MPIManager
manager = MPIClusterManagers.start_main_loop(MPI_TRANSPORT_ALL)

@mpi_do manager begin
    using MPI
    comm = MPI.COMM_WORLD
    println(
            "Hello world,"
            * " I am $(MPI.Comm_rank(comm)) of $(MPI.Comm_size(comm))"
            * " on node $(gethostname())"
           )

    println("[rank $(MPI.Comm_rank(comm))]: Importing Elemental")
    using LinearAlgebra, Elemental
    println("[rank $(MPI.Comm_rank(comm))]: Done importing Elemental")

    println("[rank $(MPI.Comm_rank(comm))]: Solving SVD for $(N)x$(N)")
end

@mpi_do manager A = Elemental.DistMatrix(Float64);
@mpi_do manager Elemental.gaussian!(A, N, N);
@mpi_do manager @time U, s, V = svd(A); 
@mpi_do manager println(s[1])

# Manage MPIManager manually:
# Elemental needs to be finalized before shutting down MPIManager
@mpi_do manager begin
    println("[rank $(MPI.Comm_rank(comm))]: Finalizing Elemental")
    Elemental.Finalize()
    println("[rank $(MPI.Comm_rank(comm))]: Done finalizing Elemental")
end
# Shut down MPIManager
MPIClusterManagers.stop_main_loop(manager)

I ran some strong scaling tests on 4 Intel Haswell nodes (https://docs.nersc.gov/systems/cori/#haswell-compute-nodes) using a 4000x4000, 8000x8000, and 16000x16000 random matrix.

chart

I am measuring only the svd(A) time. I am attaching my measured times, and wanted to check if this is what you would expect. I am not an expert in how Elemental computes SVDs in a distributed fashion, and so would would be grateful for any advise you have for optimizing this benchmark's performance. In particular, I am interested in understanding what the optimal number of ranks are as a function of problem size (I am hoping that this is such an obvious questions, that you can point me to some existing documentation).

Cheers!

@andreasnoack
Copy link
Member

First, it might be useful to confirm that the same pattern shows up with you try to compile a C++ version of this problem.

@JBlaschke
Copy link
Member Author

That was what I was thinking. Unfortunately I am not familiar with how to use Elemental, and the docs hosting seems to be broken (and I can't find the docs sources either). Do you know where I can find a copy of the full docs? I am looking for the C++ equivalent of: Elemental.DistMatrix, Elemental.gaussian!, and svd, so that I can replicate the example above in C++.

I am able to build libEl

Cheers,
Johannes

@andreasnoack
Copy link
Member

It looks like you can still browse the html version of the documentation although it doesn't render correctly. I think the best place for you to look is https://github.com/LLNL/Elemental/blob/hydrogen/tests/lapack_like/SVD.cpp#L157. It should be possible adapt that test to something similar to the example above.

@JBlaschke
Copy link
Member Author

JBlaschke commented Oct 6, 2021

Thanks for the blob -- I'll try to understand it given the docs that I can find. At this point I only understand 10%. Btw, not all of the docs can be browsed: https://elemental.github.io/documentation/0.85/core/dist_matrix.html

@andreasnoack
Copy link
Member

The source for the documentation is at https://github.com/elemental/elemental-web. I've asked your colleague at LLNL if they could start hosting the docs since they are already maintaining the fork of Elemental, LLNL/Elemental#80 (comment).

@JBlaschke
Copy link
Member Author

Thanks! I'll also look into hosting that locally.

FTR: NERSC is at LBNL, and LBNL != LLNL. It's a common misunderstanding, and we are all friends.

@andreasnoack
Copy link
Member

I had the pleasure of spending some days at NERSC a couple of years ago while working on a project where we ran Julia code on Cori so I'm well aware that it's two different labs. The "colleagues" was in the sense that you both are under DOE. The folks at Livermore forked Elemental a couple of years ago so it would make sense for them to host the documentation but if you don't mind doing it would also be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants