Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow arguments to JULIA_MPIEXEC #344

Closed
omlins opened this issue Jan 28, 2020 · 9 comments · Fixed by #345
Closed

Allow arguments to JULIA_MPIEXEC #344

omlins opened this issue Jan 28, 2020 · 9 comments · Fixed by #345

Comments

@omlins
Copy link
Contributor

omlins commented Jan 28, 2020

Building MPI with cray-mpich (v7.7.10) works perfectly well, except for the following little issue.

Due to the configuration of our scheduler (SLURM), I need to manually change deps/build.log and add -C gpu or -C mc to line 89 (note that I set JULIA_MPIEXEC=srun). I.e. I need to replace

run(`$mpiexec -n 1 ./gen_consts`)

with e.g.

run(`$mpiexec -C mc -n 1 ./gen_consts`)

Thus, it would be good to allow arguments to $mpiexec for this run command. Even if this fix is very simple, it is important for us as we use automatic build scripts and the requirement of changes in the source code is an obstacle.

Thanks!

Here is the error message for completeness:

  Building MPI → `~/.julia/1.3.1/daint-mc/packages/MPI/hcbnk/deps/build.log`
┌ Error: Error building `MPI`: 
│ srun: error: You have to specify, at least, what sort of node you need: -C gpu for GPU enabled nodes, or -C mc for multicore nodes.
│ srun: error: Other features are possible, but 'gpu' and 'mc' are exclusive.
│ srun: error: Unable to allocate resources: Requested node configuration is not available
│ [ Info: Using MPI library /opt/cray/pe/mpt/7.7.10/gni/mpich-gnu/8.2/lib/libmpich.so
│ ERROR: LoadError: failed process: Process(`srun -n 1 ./gen_consts`, ProcessExited(1)) [1]
@omlins
Copy link
Contributor Author

omlins commented Jan 28, 2020

The simplest fix would probably be the following:

run(`$(split(mpiexec)) -n 1 ./gen_consts`)

However, maybe it is worthwhile to introduce something like JULIA_MPIEXEC_ARGS?

@vchuravy
Copy link
Member

export JULIA_MPIEXEC=srun -C mc didn't work? Probably because it takes the interpolation as an atom.

@omlins
Copy link
Contributor Author

omlins commented Jan 28, 2020

@vchuravy, no, this did not work; it produces 'srun -Cmc' -n 1 ./gen_consts in agreement with the Julia doc here.

@simonbyrne
Copy link
Member

However, maybe it is worthwhile to introduce something like JULIA_MPIEXEC_ARGS?

I'd be happy to add something like that, but the question is how spaces should be handled? If we split on spaces, then that prevents the ability to pass variables containing quoted spaces.

I'm hoping to move away from the gen_consts script in cases where we have a known ABI, which should address some of these issues.

@omlins
Copy link
Contributor Author

omlins commented Jan 28, 2020

Thanks @vchuravy and @simonbyrne for your comments!

Maybe quotes could be used when we want to avoid a split on all spaces?
For example, JULIA_MPIEXEC_ARGS="'-c gpu' '--nodelist=host1, host2'"
would split into '-c gpu' and '--nodelist=host1, host2'.
Maybe it would be clearer in that case to split on , for more clarity. That is, we would write:
JULIA_MPIEXEC_ARGS="'-c gpu', '--nodelist=host1, host2'"

Do you see any problem with that?

@simonbyrne
Copy link
Member

@omlins Can you try checking out the sb/mpiexec_args branch, and running Pkg.build on it?

@omlins
Copy link
Contributor Author

omlins commented Jan 30, 2020

@simonbyrne , it failed. I made a comment in your branch: 68d8c1d

@simonbyrne
Copy link
Member

Ah, sorry, I have fixed it now, please try again.

@omlins
Copy link
Contributor Author

omlins commented Jan 31, 2020

Building worked now:

(1.3.1-daint-gpu) pkg> build MPI
  Building MPI → `~/.julia/1.3.1/daint-gpu/dev/MPI/deps/build.log`
 Resolving package versions...

julia> 
omlins@daint106:~/.julia/1.3.1/daint-gpu/dev/MPI> cat ~/.julia/1.3.1/daint-gpu/dev/MPI/deps/build.log
srun: job 19795846 queued and waiting for resources
srun: job 19795846 has been allocated resources
[ Info: Using MPI library /opt/cray/pe/mpt/7.7.10/gni/mpich-gnu/8.2/lib/libmpich.so

However testing fails as the JULIA_MPIEXEC_ARGS variable is not yet used there (I guess it should a priori be used in every file where JULIA_MPIEXEC is used):

(1.3.1-daint-gpu) pkg> test MPI
   Testing MPI
 Resolving package versions...
 Installed GenericSVD ─────── v0.2.2
 Installed OpenSpecFun_jll ── v0.5.3+1
 Installed RecipesBase ────── v0.7.0
 Installed Quadmath ───────── v0.5.2
 Installed GenericSchur ───── v0.3.0
 Installed SpecialFunctions ─ v0.10.0
 Installed Polynomials ────── v0.6.0
 Installed DoubleFloats ───── v1.0.3
    Status `/tmp/jl_s7tfge/Manifest.toml`
  [ffbed154] DocStringExtensions v0.8.1
  [497a8b3b] DoubleFloats v1.0.3
  [01680d73] GenericSVD v0.2.2
  [c145ed77] GenericSchur v0.3.0
  [da04e1cc] MPI v0.11.0 [`~/.julia/1.3.1/daint-gpu/dev/MPI`]
  [efe28fd5] OpenSpecFun_jll v0.5.3+1
  [f27b6e38] Polynomials v0.6.0
  [be4d8f0f] Quadmath v0.5.2
  [3cdcf5f2] RecipesBase v0.7.0
  [ae029012] Requires v0.5.2
  [276daf66] SpecialFunctions v0.10.0
  [2a0f44e3] Base64  [`@stdlib/Base64`]
  [ade2ca70] Dates  [`@stdlib/Dates`]
  [8ba89e20] Distributed  [`@stdlib/Distributed`]
  [b77e0a4c] InteractiveUtils  [`@stdlib/InteractiveUtils`]
  [76f85450] LibGit2  [`@stdlib/LibGit2`]
  [8f399da3] Libdl  [`@stdlib/Libdl`]
  [37e2e46d] LinearAlgebra  [`@stdlib/LinearAlgebra`]
  [56ddb016] Logging  [`@stdlib/Logging`]
  [d6f4376e] Markdown  [`@stdlib/Markdown`]
  [44cfe95a] Pkg  [`@stdlib/Pkg`]
  [de0858da] Printf  [`@stdlib/Printf`]
  [3fa0cd96] REPL  [`@stdlib/REPL`]
  [9a3f8284] Random  [`@stdlib/Random`]
  [ea8e919c] SHA  [`@stdlib/SHA`]
  [9e88b42a] Serialization  [`@stdlib/Serialization`]
  [6462fe0b] Sockets  [`@stdlib/Sockets`]
  [8dfed614] Test  [`@stdlib/Test`]
  [cf7118a7] UUIDs  [`@stdlib/UUIDs`]
  [4ec0a83e] Unicode  [`@stdlib/Unicode`]
Precompiling project...
Precompiling MPI
Precompiling DoubleFloats
Running MPI.jl tests
srun: error: You have to specify, at least, what sort of node you need: -C gpu for GPU enabled nodes, or -C mc for multicore nodes.
srun: error: Other features are possible, but 'gpu' and 'mc' are exclusive.
srun: error: Unable to allocate resources: Requested node configuration is not available
ERROR: LoadError: failed process: Process(`srun -n 4 /scratch/snx3000/omlins/9_soft_install_julia_1_3/software/Julia/1.3.1-CrayGNU-19.10-cuda-10.1/bin/julia --code-coverage=none /users/omlins/.julia/1.3.1/daint-gpu/dev/MPI/test/test_allgather.jl`, ProcessExited(1)) [1]

Stacktrace:
 [1] pipeline_error at ./process.jl:525 [inlined]
 [2] #run#565(::Bool, ::typeof(run), ::Cmd) at ./process.jl:440
 [3] run at ./process.jl:438 [inlined]
 [4] runtests() at /users/omlins/.julia/1.3.1/daint-gpu/dev/MPI/test/runtests.jl:44
 [5] top-level scope at /users/omlins/.julia/1.3.1/daint-gpu/dev/MPI/test/runtests.jl:53
 [6] include at ./boot.jl:328 [inlined]
 [7] include_relative(::Module, ::String) at ./loading.jl:1105
 [8] include(::Module, ::String) at ./Base.jl:31
 [9] include(::String) at ./client.jl:424
 [10] top-level scope at none:6
in expression starting at /users/omlins/.julia/1.3.1/daint-gpu/dev/MPI/test/runtests.jl:53
ERROR: Package MPI errored during testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants