-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wdmerger is non-deterministic and results depend on number of OpenMP threads #646
Comments
Attaching inputs/probin files. |
This also does not occur with gravity.max_multipole_order=0 at this resolution, but it does occur at a higher resolution (amr.n_cell=64 64 64; amr.blocking_factor=16). |
From bisection it looks like this started in January 2019 with the merge of #493. |
We've also had an issue in the past with this limiter (#132). |
If I disable the OpenMP threading of ca_compute_multipole_moments, then the results are deterministic again. So the focus on the flux limiter appears to have been misleading, and I'm now looking at the multipole BCs. |
An issue that is likely related is that if I disable tiling of ca_compute_multipole_moments, the results differ compared to the tiling case, even when OpenMP is disabled. This seems to also only be true when we're using the hydro flux limiter. |
Yet more weirdness: the answer differs if I use 8 MPI ranks instead of 16 MPI ranks, even though this problem only has 8 boxes, so the extra 8 ranks should not be doing anything. |
I reproduce the (MPI-only) effect when using the monopole BCs instead of the multipole BCs:
does not give the same result as using 32 ranks. It also yields different results when using MonopoleGrav instead of PoissonGrav. So the evidence is pointing back to the flux limiter again (the effect does not occur when it is off). |
The effect does not occur if I use a CFL number of 0.1, but does still occur if I keep the default CFL and replace the CFL number used in the flux limiter (for constructing the Lax-Friedrichs flux) with 0.1. |
There is a bug in the flux limiter:
The problem with this calculation is that the (lo, hi) for this routine is nodal, but Another problem with this calculation (which is unrelated to this issue) is that for non-Cartesian coordinates it will be wrong -- we need to replace that coefficient with one that is separate for both the left and right sides of the interface, which will have different volumes. |
Another (less important) issue is that the Lax-Friedrichs flux does not zero out the UTEMP and USHK components, which matters because the zeroing out of the UTEMP and USHK components in the CTU hydro advance comes before the call to this flux limiter. |
The issues from the previous two comments have been resolved. The original issue is still present. It can apparently be fixed by only updating drhoLFL and drhoLFR if fluxLF violates the density floor on either of the respective interfaces, but it is unclear why this would be a correct thing to do. |
This resolves the issue in #646. We were getting non-deterministic behavior because we were mixing and matching updates from the left and right side of the interface. By only applying the limiter to whichever side of the interface would go negative, and thus only obtaining a single theta for the limiter, we sidestep the issue and also simplify the code.
I believe this is now resolved. The issue appears to have occurred because we were not calculating the limiter in a self-consistent manner -- the Lax-Friedrichs flux was being updated from both sides of the interface even though the theta values are supposed to be one-sided. I have made the code consistent so that it only uses a one-sided theta to calculate the update, and in doing so I realized that we don't need two theta values, only one, so I simplified the code as well. There is still likely work to do on this limiter, because it results in more timesteps on the above test problem than not using it, but at least the non-determinism is now gone. |
When compiled using MPI + OpenMP with either PGI or GNU, wdmerger's results depend on the number of OpenMP threads and are non-deterministic in the multi-threading case. (This is also seen with CUDA.) I am compiling with
and running on a Summit-like system with either
or
The runs will vary in the number of timesteps taken to get to the final simulation time. The two runs will If I run with OMP_NUM_THREADS=1 and varying numbers of MPI ranks, the results do not differ.
This effect does not occur with do_grav=0, but it is not clear if that means the issue is specifically in gravity or the simulation is just so different without gravity that the issue doesn't have a chance to manifest.
The text was updated successfully, but these errors were encountered: