Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing MPI through BinaryBuilder #328

Closed
ViralBShah opened this issue Dec 20, 2019 · 32 comments · Fixed by #367
Closed

Providing MPI through BinaryBuilder #328

ViralBShah opened this issue Dec 20, 2019 · 32 comments · Fixed by #367

Comments

@ViralBShah
Copy link
Member

I'm trying to get OpenMPI build through BB. For now, I have successful builds, but until we can try it out, we won't know what the issues are. Just posting here for folks who might be interested.

JuliaPackaging/Yggdrasil#308

@ViralBShah
Copy link
Member Author

ViralBShah commented Dec 20, 2019

If OpenMPI_jll is automatically installed with this package as a dependency, I am assuming that people want to be able to use an environment variable to decide whether to use it (or use the system provided MPI).

I presume the default should be to use the system MPI if found, and fall back to our bundled one if mpiexec is not found on the path.

@simonbyrne
Copy link
Member

You could try using MPICH instead of OpenMPI: this is closer to a "standard implementation", so might be easier to build. Note that Spack is considering changing their default due to frequent issues with OpenMPI: spack/spack#13259

Also MPI.jl currently assumes that you use MS MPI on Windows: we should change it so that other MPI implementations will work as well.

@ViralBShah
Copy link
Member Author

Is MPICH well maintained nowadays? I can certainly package it. Would be great to get thoughts from some others here on this topic as well.

@ViralBShah
Copy link
Member Author

To answer my own question, yes it is. https://github.com/pmodels/mpich

@ViralBShah ViralBShah changed the title Providing OpenMPI through BinaryBuilder Providing MPI through BinaryBuilder Dec 21, 2019
@ViralBShah
Copy link
Member Author

MPICH_jll is now registered as well. It doesn't build on Windows either. It seems like MS MPI is the only real option on Windows.

JuliaPackaging/Yggdrasil#315

@vchuravy
Copy link
Member

Before we can do this we need to make sure that the overlay system properly works and we can actually use vendor provided MPI libraries.

@ViralBShah
Copy link
Member Author

What are thoughts on making Julia 1.3 the minimum Julia version going forward?

@lcw
Copy link
Member

lcw commented Dec 21, 2019 via email

@giordano
Copy link
Member

In other words, how would they put mpirun/mpiexec into their path?

If you only want to call them from Julia, there are a wrapper functions to call executables provided by JLL packages

@eschnett
Copy link
Contributor

There should be a Julia-specific way to run the correct mpirun. For example, when configuring MPI, Julia could create a symbolic link from the config directory to the correct mpirun. (This is necessary since many HPC systems have many versions of MPI installed.)

@ViralBShah
Copy link
Member Author

I was imagining that we use the system provided mpirun either on the PATH or a user-defined environment variable (as is currently the case in deps/build.jl). In absence of both of those, use the Julia bundled one.

@ViralBShah
Copy link
Member Author

There should be a Julia-specific way to run the correct mpirun. For example, when configuring MPI, Julia could create a symbolic link from the config directory to the correct mpirun. (This is necessary since many HPC systems have many versions of MPI installed.)

I feel that recent experience suggests we should avoid symlinks since those things don't work on Windows. Not that any server side runs windows, but it would be nice for the tooling and testing to work on all platforms. However, your larger point about having a mechanism to pick one of the many MPI implementations is what we need to achieve.

@simonbyrne
Copy link
Member

I think @eschnett meant that there needs to be a way for users to call mpirun from outside Julia

@ViralBShah
Copy link
Member Author

Don't we already have a way to do that in the current setup? I imagine we need a good way to switch between that and the bundled one.

@simonbyrne
Copy link
Member

We currently assume the MPI library provides one.

Another thing we need are the generated constants for the library in question.

@simonbyrne
Copy link
Member

If you only want to call them from Julia, there are a wrapper functions to call executables provided by JLL packages

@giordano what are these wrapper functions?

@giordano
Copy link
Member

giordano commented Jan 16, 2020

For each ExecutableProduct defined in the BinaryBuilder's script, the JLL package provides a wrapper function to call the executable provided by the binary package. This is explained in the README.md of all JLL packages, see for example git. What the wrapper does is to set the relevant environment variables (PATH and LD_LIBRARY_PATH, or their variants for each operating system) locally when calling the executable (so that it doesn't affect the rest of the environment), see again the example of git

@simonbyrne
Copy link
Member

Ah, ok: so it seems that the ExecutableProducts aren't registered for the launchers. I'll open an issue.

@simonbyrne
Copy link
Member

@simonbyrne
Copy link
Member

simonbyrne commented Jan 16, 2020

Here is my vague plan: I'm not sure if this is feasible, so let me know:

  1. We create an MPI_jll which is basically an alias to MPICH_jll:

    • if we can't get MPICH to build on Windows, we will have to make it point to Microsoft MPI instead on Windows: this will require wrapping
    • is this possible without duplicating all of MPICH_jll?
  2. User who want OpenMPI (provided by BinaryBuilder) would then override this to point to OpenMPI_jll

    • Can this be done for all versions, or would it need to be done on a per-version basis?
  3. Users who want to use a system MPI would use the usual override mechanism

    • one benefit of using MPICH as the base is that many other MPIs are ABI-compatible with it.

The main question is how to handle the logic in deps/gen_consts.jl for generating constants:

  • one option would be to move it into Yggdrasil, and creating a mpiconst_jl FileProduct, this seems somewhat elegant, but it would make it more difficult to override.

@ViralBShah
Copy link
Member Author

MPICH should build for windows, but the issue seems to be our cross-compiling at the moment.

@simonbyrne
Copy link
Member

simonbyrne commented Jan 21, 2020

Update: We have MPICH working via BinaryProvider (#339), but not on Windows, and unfortunately the Overrides.toml doesn't work at the moment (JuliaPackaging/BinaryBuilder.jl#625).

simonbyrne added a commit that referenced this issue Jan 22, 2020
To prepare for switching to use BinaryBuilder-provided binaries (#328/#339), this deprecates `mpiexec` so that we can make it consistent. I've renamed it to `mpiexec_path`.
@simonbyrne
Copy link
Member

We should probably have some extra check to try to detect if we're on a cluster, and warn users who are using the BB-provided MPI. The easiest is to probably check environment variables, e.g. SLURM_JOB_ID or PBS_JOBID. Are there any other common systems we should check for?

@ViralBShah
Copy link
Member Author

@ViralBShah
Copy link
Member Author

ViralBShah commented Jan 22, 2020

Another way to do it might be to check if we detect an mpirun or mpiexec on the path.

@ViralBShah
Copy link
Member Author

I assume that the environment variables are not set on the head node but only get set when you have submitted the job. I suspect one could also check if a ClusterManager is being used.

@simonbyrne
Copy link
Member

I assume that the environment variables are not set on the head node but only get set when you have submitted the job.

That's fine: if someone is running MPI jobs on the head node, they've got bigger problems

@ViralBShah
Copy link
Member Author

But isn't it too late to warn them when they are already running the job? Perhaps all we need is an INFO line that says which MPI is being used.

simonbyrne added a commit that referenced this issue Jan 22, 2020
To prepare for switching to use BinaryBuilder-provided binaries (#328/#339), this deprecates `mpiexec` so that we can make it consistent. I've renamed it to `mpiexec_path`.
simonbyrne added a commit that referenced this issue Jan 23, 2020
To prepare for switching to use BinaryBuilder-provided binaries (#328/#339), this deprecates `mpiexec` so that we can make it consistent. I've renamed it to `mpiexec_path`.
@ViralBShah
Copy link
Member Author

What's the next step here? Can we provide MPICH_jll as a dependency for this package? I would be ok with manual environment variable setting for using the system MPI.

@simonbyrne
Copy link
Member

  1. Add code to define constants for MPICH, OpenMPI and MS MPIs ABIs
  2. Figure out how to detect which of the ABIs are used. Unfortunately there isn't a reliable way to do it (see Improve ABI for identifying an MPI implementation mpi-forum/mpi-issues#159), but we can call MPI_Get_library_version with a sufficiently large buffer (8192 is the largest value of MPI_MAX_LIBRARY_VERSION_STRING we have seen).
  3. If one of the above ABIs are used (which should cover 99% of cases) then we can use that, otherwise fall back on the code generation
  4. I think we should use the system MPI if one is detected, otherwise use MPICH_jll (or potentially MS MPI if we can BinaryProvide it).

@vchuravy
Copy link
Member

vchuravy commented Mar 3, 2020

One other issue is library names. As seen in #356 mpi libraries have different names for their so and the override system only allows us to specify alternative directory roots. So we need a more elaborate compatibility system.

Two other things:

  1. One thing the CUDA ecosystem has moved towards is to use find the libraries at startup instead of caching the result. @kpamnany recently run into a situation where switching the MPI implementation was error prone since it required re-running ]build MPI
  2. The assumption that the place that MPI.jl is installed is writeable is false. Only the first depot is writeable and on TXE-1 at MIT packages are routinely installed into a global non-writable depot

@ViralBShah
Copy link
Member Author

ViralBShah commented Apr 18, 2020

Related: #367 and JuliaPackaging/Yggdrasil#550

simonbyrne added a commit that referenced this issue May 7, 2020
Fixes #328 (as well as #373 and #376).

- Will use BinaryBuilder-provided MPICH / MicrosoftMPI by default
- Provides override mechanism to use system MPI
- Saves these preferences in ~/.julia/prefs/MPI.toml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants