convert CTU hydro subroutines to C++ #525

zingale · 2019-02-26T17:48:56Z

For best portability, especially with GPUs, we should start converting the various hydro routines over to C++. We can do this piece by piece, test them one-by-one, and merge.

zingale · 2020-01-10T02:39:50Z

some easy targets to start with:

ca_srctoprim
ca_compute_cfl
limit_hydro_fluxes_on_small_dens
limit_hydro_fluxes_on_large_vel
ca_shock
divu
normalize_species_fluxes
apply_av
scale_flux

zingale · 2020-01-12T15:56:51Z

we need the upass_map stuff in C++ and need to make the qaux stuff available in C++ as well.

zingale · 2020-02-11T15:39:17Z

this is in progress in PRs #731, #756, #761

certik · 2020-04-15T20:22:42Z

@zingale we are trying to build a Fortran community and tooling (https://fortran-lang.org) with the aim to eventually fix some of the reasons why people currently have to move to C++. That is a long term, multi year effort.

I was wondering if you would share with us some of the things that Fortran should fix, from your point of view, so that you could continue using it, instead of rewriting to C++?

It seems it's portability and GPUs. For the portability part, I would be interested in more details, e.g. what platforms / compilers don't work well, etc.

For the GPU I think I can guess: Fortran currently does not have a good solution, besides CUDA Fortran, which only works with some GPUs and some compilers. That's something I would like to help fix with LFortran down the road.

zingale · 2020-04-15T22:59:12Z

Hi Ondrej, Castro builds off of AMReX which is written in C++ and provides a lot of high-level abstractions that make it easy to run on CPUs or GPUs. Initially we built our own method to offload Fortran routines to GPUs, but it is too much to maintain, so we chose to move to C++ so we can take advantage of the features of AMReX and lessen our own development burden.

certik · 2020-04-15T23:35:50Z

Thanks Michael. Yes, we use AMReX also for some of our stuff at LANL (we also have it in Fortran).

It seems the conclusion here is that Fortran compilers and related tooling maintained by the community (as opposed to by the Castro team) should handle the GPU offload, robustly and painlessly. That would go a long way.

That might not fully fix all the issues with codes like Castro because my own experience with projects that use multiple languages (whether C++ and Python, or C/Cython/Python or C++ and Fortran) is that it adds lots of complexity and requires people to know both languages well, as well as how they interact. And it's much easier to just stick to one language (say C++) and do everything in it.

So perhaps part of the story where Fortran can improve is also easier interaction with C++ libraries, so that projects like Castro don't have to maintain complicated wrappers, and if things can simplify both for AMReX developers as well as their users, then perhaps it would be more viable to use Fortran with AMReX.

maximumcats · 2020-04-18T16:35:48Z

Yes, improved interoperability between C++ and Fortran would go a long way toward improving what we were trying to do. The use case of creating GPU kernels from C++ and calling Fortran functions on a per-element basis as device functions did not work very efficiently for us, in part because we were gluing together two programming models (CUDA C++ and CUDA Fortran) whose design principles did not include this paradigm.

maximumcats · 2020-04-18T17:42:59Z

However there are some programming methods that are just inherently easier to implement in C++. The case we care most about is that we have certain physics functions that are evaluated on a per-element basis across all of the elements in our computational domain, and these functions dominate the computational expense in certain cases. In Fortran these are most naturally expressed as module functions, allowing no opportunity to inline and thus efficiently optimize these function calls. In C++ we can (and did) implement these functions as a header-only library, allowing for efficient inlining. It's not clear to us that this is important on traditional CPU platforms but we believe it is a major issue for GPU code.

If link-time optimization were available that was approximately as efficient as we would get from direct inlining of the C++ source, that would also have gone a long way toward making our Fortran implementation viable.

certik · 2020-04-18T17:54:58Z

@maxpkatz thank you for this feedback. As a Fortran user, I too share pretty much the same frustrations. Regarding inlining, why couldn't Fortran compilers simply inline module functions where it makes sense? I think the link time inlining is too late. I don't know, I just see a lot of opportunities to do things vastly better. I started a new compiler called LFortran (https://lfortran.org/), I am not there yet to be tackling some of these issues, but it is my plan to do that. I will keep you updated.

zingale added the GPU label Mar 6, 2019

zingale closed this as completed Feb 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert CTU hydro subroutines to C++ #525

convert CTU hydro subroutines to C++ #525

zingale commented Feb 26, 2019

zingale commented Jan 10, 2020

zingale commented Jan 12, 2020

zingale commented Feb 11, 2020

certik commented Apr 15, 2020 •

edited

Loading

zingale commented Apr 15, 2020

certik commented Apr 15, 2020

maximumcats commented Apr 18, 2020

maximumcats commented Apr 18, 2020

certik commented Apr 18, 2020

convert CTU hydro subroutines to C++ #525

convert CTU hydro subroutines to C++ #525

Comments

zingale commented Feb 26, 2019

zingale commented Jan 10, 2020

zingale commented Jan 12, 2020

zingale commented Feb 11, 2020

certik commented Apr 15, 2020 • edited Loading

zingale commented Apr 15, 2020

certik commented Apr 15, 2020

maximumcats commented Apr 18, 2020

maximumcats commented Apr 18, 2020

certik commented Apr 18, 2020

certik commented Apr 15, 2020 •

edited

Loading