wdmerger is non-deterministic and results depend on number of OpenMP threads #646

maximumcats · 2019-07-31T04:14:04Z

When compiled using MPI + OpenMP with either PGI or GNU, wdmerger's results depend on the number of OpenMP threads and are non-deterministic in the multi-threading case. (This is also seen with CUDA.) I am compiling with

make -j40 USE_MPI=TRUE USE_OMP=TRUE INTEGRATOR_DIR=VODE90 COMP=gnu

and running on a Summit-like system with either

OMP_NUM_THREADS=2 jsrun -n 16 -a 1 -c 2 -g 0 ./Castro3d.gnu.MPI.OMP.ex inputs_collision

or

OMP_NUM_THREADS=7 jsrun -n 6 -a 1 -c 7 -g 0 ./Castro3d.gnu.MPI.OMP.ex inputs_collision

The runs will vary in the number of timesteps taken to get to the final simulation time. The two runs will If I run with OMP_NUM_THREADS=1 and varying numbers of MPI ranks, the results do not differ.

This effect does not occur with do_grav=0, but it is not clear if that means the issue is specifically in gravity or the simulation is just so different without gravity that the issue doesn't have a chance to manifest.

The text was updated successfully, but these errors were encountered:

maximumcats · 2019-07-31T04:17:29Z

Attaching inputs/probin files.

inputs_collision.txt
probin_collision.txt

maximumcats · 2019-07-31T04:34:07Z

This also does not occur with gravity.max_multipole_order=0 at this resolution, but it does occur at a higher resolution (amr.n_cell=64 64 64; amr.blocking_factor=16).

maximumcats · 2019-07-31T06:29:33Z

From bisection it looks like this started in January 2019 with the merge of #493.

maximumcats · 2019-07-31T06:55:43Z

We've also had an issue in the past with this limiter (#132).

maximumcats · 2019-08-03T17:25:54Z

If I disable the OpenMP threading of ca_compute_multipole_moments, then the results are deterministic again. So the focus on the flux limiter appears to have been misleading, and I'm now looking at the multipole BCs.

maximumcats · 2019-08-10T06:38:15Z

An issue that is likely related is that if I disable tiling of ca_compute_multipole_moments, the results differ compared to the tiling case, even when OpenMP is disabled. This seems to also only be true when we're using the hydro flux limiter.

maximumcats · 2019-08-11T02:55:12Z

Yet more weirdness: the answer differs if I use 8 MPI ranks instead of 16 MPI ranks, even though this problem only has 8 boxes, so the extra 8 ranks should not be doing anything.

maximumcats · 2019-08-12T03:11:31Z

I reproduce the (MPI-only) effect when using the monopole BCs instead of the multipole BCs:

jsrun -n 16 -a 1 -c 1 -g 0 ./Castro3d.gnu.MPI.ex inputs_collision amr.n_cell=64 64 64 gravity.max_multipole_order=0

does not give the same result as using 32 ranks.

It also yields different results when using MonopoleGrav instead of PoissonGrav. So the evidence is pointing back to the flux limiter again (the effect does not occur when it is off).

maximumcats · 2019-08-12T05:36:54Z

The effect does not occur if I use a CFL number of 0.1, but does still occur if I keep the default CFL and replace the CFL number used in the flux limiter (for constructing the Lax-Friedrichs flux) with 0.1.

maximumcats · 2019-08-12T06:17:27Z

There is a bug in the flux limiter:

flux_coef = TWO * (dt / alpha) * (area(i,j,k) / vol(i,j,k))

The problem with this calculation is that the (lo, hi) for this routine is nodal, but vol is cell-centered, so this will result in invalid indices being accessed. (I don't know why valgrind and DEBUG mode didn't catch this.) However, fixing this (for example, by replacing area / vol with a divide by dx) does not resolve the issue.

Another problem with this calculation (which is unrelated to this issue) is that for non-Cartesian coordinates it will be wrong -- we need to replace that coefficient with one that is separate for both the left and right sides of the interface, which will have different volumes.

maximumcats · 2019-08-12T06:38:52Z

Another (less important) issue is that the Lax-Friedrichs flux does not zero out the UTEMP and USHK components, which matters because the zeroing out of the UTEMP and USHK components in the CTU hydro advance comes before the call to this flux limiter.

maximumcats · 2019-10-05T23:40:04Z

The issues from the previous two comments have been resolved. The original issue is still present. It can apparently be fixed by only updating drhoLFL and drhoLFR if fluxLF violates the density floor on either of the respective interfaces, but it is unclear why this would be a correct thing to do.

This resolves the issue in #646. We were getting non-deterministic behavior because we were mixing and matching updates from the left and right side of the interface. By only applying the limiter to whichever side of the interface would go negative, and thus only obtaining a single theta for the limiter, we sidestep the issue and also simplify the code.

maximumcats · 2019-10-06T05:43:03Z

I believe this is now resolved. The issue appears to have occurred because we were not calculating the limiter in a self-consistent manner -- the Lax-Friedrichs flux was being updated from both sides of the interface even though the theta values are supposed to be one-sided. I have made the code consistent so that it only uses a one-sided theta to calculate the update, and in doing so I realized that we don't need two theta values, only one, so I simplified the code as well.

There is still likely work to do on this limiter, because it results in more timesteps on the above test problem than not using it, but at least the non-determinism is now gone.

maximumcats self-assigned this Jul 31, 2019

maximumcats added the bug 🐛 label Jul 31, 2019

maximumcats added a commit that referenced this issue Oct 5, 2019

Disable hydro flux limiter due to bug #646

d9a607d

maximumcats closed this as completed Oct 6, 2019

maximumcats mentioned this issue Oct 6, 2019

wdmerger is non-deterministic when using CUDA #679

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wdmerger is non-deterministic and results depend on number of OpenMP threads #646

wdmerger is non-deterministic and results depend on number of OpenMP threads #646

maximumcats commented Jul 31, 2019

maximumcats commented Jul 31, 2019

maximumcats commented Jul 31, 2019 •

edited

Loading

maximumcats commented Jul 31, 2019

maximumcats commented Jul 31, 2019

maximumcats commented Aug 3, 2019

maximumcats commented Aug 10, 2019 •

edited

Loading

maximumcats commented Aug 11, 2019

maximumcats commented Aug 12, 2019 •

edited

Loading

maximumcats commented Aug 12, 2019

maximumcats commented Aug 12, 2019 •

edited

Loading

maximumcats commented Aug 12, 2019

maximumcats commented Oct 5, 2019

maximumcats commented Oct 6, 2019 •

edited

Loading

wdmerger is non-deterministic and results depend on number of OpenMP threads #646

wdmerger is non-deterministic and results depend on number of OpenMP threads #646

Comments

maximumcats commented Jul 31, 2019

maximumcats commented Jul 31, 2019

maximumcats commented Jul 31, 2019 • edited Loading

maximumcats commented Jul 31, 2019

maximumcats commented Jul 31, 2019

maximumcats commented Aug 3, 2019

maximumcats commented Aug 10, 2019 • edited Loading

maximumcats commented Aug 11, 2019

maximumcats commented Aug 12, 2019 • edited Loading

maximumcats commented Aug 12, 2019

maximumcats commented Aug 12, 2019 • edited Loading

maximumcats commented Aug 12, 2019

maximumcats commented Oct 5, 2019

maximumcats commented Oct 6, 2019 • edited Loading

maximumcats commented Jul 31, 2019 •

edited

Loading

maximumcats commented Aug 10, 2019 •

edited

Loading

maximumcats commented Aug 12, 2019 •

edited

Loading

maximumcats commented Aug 12, 2019 •

edited

Loading

maximumcats commented Oct 6, 2019 •

edited

Loading