Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide BB binaries #52

Closed
ViralBShah opened this issue Jan 22, 2020 · 28 comments
Closed

Provide BB binaries #52

ViralBShah opened this issue Jan 22, 2020 · 28 comments

Comments

@ViralBShah
Copy link
Member

Now that we are close to having MPI.jl use BB MPI binaries, it would be nice to have Elemental through BB too.

@andreasnoack
Copy link
Member

How would that work with custom MPIs?

@ViralBShah
Copy link
Member Author

I guess it would not. You have to be on your own then. Quite likely people with custom MPI are also using custom compilers.

@andreasnoack
Copy link
Member

I think that, right now, we'd be able to compile against the custom MPIs. I'm curious how MPI.jl handles this case.

@ViralBShah
Copy link
Member Author

MPI.jl is easier. Use an environment variable or some other mechanism to pick BB or system.

JuliaParallel/MPI.jl#328

We can retain the scripts to compile against a system MPI.

@ViralBShah
Copy link
Member Author

You could also retain the code to build and link against a system MPI, while having BB provided Elemental as default.

I think a meaningfully useful MPI setup with common MPI libs that works out of the box would greatly reduce dev effort and get us a working system out of the box.

Do we know if people compile Elemental against system MPI through the supplied build scripts in this package?

@ViralBShah
Copy link
Member Author

@christopher-dG has built BB binaries for Elemental and we already have MPICH. The next step is to update this package to use them.

@christopher-dG
Copy link
Contributor

Is 1.0 compatibility a priority? It can be done but it's simpler to just drop it.

@ViralBShah
Copy link
Member Author

ViralBShah commented Apr 9, 2020

Drop it. A lot of this ecosystem is young, and users are most likely researchers using Julia 1.3+.

@ViralBShah
Copy link
Member Author

ViralBShah commented Apr 13, 2020

I just realized that Elemental is forked by LLNL, and as per Elemental's github page, the LLNL fork is the currently maintained one. It also has a more recent release:

https://github.com/LLNL/Elemental

I updated the README to point to it. It's good to have the current release in BB, and would be nice to get the new one also in (hopefully doesn't break existing APIs).

@christopher-dG
Copy link
Contributor

Yeah I saw it too, but went with the one that was already in use to avoid any breakage. In the future we should definitely look at the newer release.

@ViralBShah
Copy link
Member Author

Yeah, I think that was the right decision. Use the one for now we know that works, but then get the new one in BB as well to pave the way for updating to the new one.

@christopher-dG
Copy link
Contributor

@ViralBShah
Copy link
Member Author

Is it a matter of just one test failing, or just about everything. I guess we have to get into some internals. @andreasnoack Are you familiar with this codebase?

@christopher-dG
Copy link
Contributor

Here's the most recent one: https://travis-ci.org/github/JuliaParallel/Elemental.jl/jobs/674568005

Some stuff is passing, but about half of the cases are causing segfaults. It's kind of hard to tell though because the tests don't exist in a normal @testset (probably due to them needing to run via mpiexec).

@ViralBShah
Copy link
Member Author

Do they fail on your local machine also?

@christopher-dG
Copy link
Contributor

Yeah they do.

@ViralBShah
Copy link
Member Author

I am unable to dev Elemental perhaps because it doesn't have a Project.toml.

@christopher-dG
Copy link
Contributor

christopher-dG commented Apr 13, 2020

Yeah you'll have to clone manually and then checkout my branch. I'm also working on a cdg/jll2 branch that contains more minimal changes to try to make sure that I'm not accidentally breaking anything. (that branch is now the open PR)

@ViralBShah
Copy link
Member Author

I wonder if we can add a Project.toml so it's easier to dev. I'll check it out manually in the meanwhile.

@andreasnoack
Copy link
Member

I'll try to add a project file shortly and then we can use that version to compare with.

@andreasnoack
Copy link
Member

andreasnoack commented Apr 19, 2020

I finally managed to get CI passing and have merged #55 so now we have a project file. I had to temporarily disable testing on Mac but will try to enable that again once I've figured out which compilers I'm now supposed to use on Travis macOS.

@christopher-dG
Copy link
Contributor

And this week I'll be taking a deep dive trying to get the BB-compiled libEl to work...

@ViralBShah
Copy link
Member Author

ViralBShah commented Apr 19, 2020

FWIW, the first sequential code example in the README is broken - so I don't think it is a BB binaries issue. Try the second or the third one, which works. I filed the relevant issues.

@christopher-dG
Copy link
Contributor

I've just been basing it on the tests. About half of them segfault or something along those lines.

@ViralBShah
Copy link
Member Author

Yeah - then there is no option but to dive in and fix the issues.

@andreasnoack
Copy link
Member

The broken example was just because the convert method used the old array constructor without the undef part, see #60, so if there are segfaults it's definitely something deeper.

@ViralBShah
Copy link
Member Author

Should we tag a new release now that we have the BB binaries. Perhaps open new issues on segfaults and such as we encounter?

@andreasnoack
Copy link
Member

#54 would need to get merged first, rigth?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants