-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nompi stub #425
base: develop
Are you sure you want to change the base?
Add nompi stub #425
Conversation
@cwsmith, build is failing for this PR. I think we should add |
@flagdanger and I are working on a very large update of PCU that should make this all easier by getting rid of the singleton. Pull request for that coming within the next week. |
@bobpaw sorting out a few things to make the pull request ready to merge, but #388 the draft pull request for replacing the singleton in PCU. Although we replaced the interface used in the test cases, this pull request is designed to work with no modification to existing codes by using a global PCU state when the old style interface is used with |
@jacobmerson, we will plan to use your PR for the future of core, but the sponsor wants this functionality soon, so we will use this branch until yours is finished. |
@bobpaw Does |
No, at the moment it only checks if PCU has been initialized using a global state variable. We could check with |
@bobpaw OK. I'm wondering if it will make sense to make the MPI check in that call, and if it fails call MPI_Init. But... handling MPI_Finalize could be a bit more tricky; other libraries may still be running and using MPI beyond the lifetime of PCU. I'd prefer symmetry in the interface so your original suggestion may be best. |
Added pcu_pnompi.c. Added SCOREC_NO_MPI CMake option. Removed direct MPI calls in parma, phasta, and PUMI. Added incomplete reimplementations of those MPI calls.
Add stubs for MPI_Init and MPI_Finalize. Add pcu_pnompi_types.h for MPI typedefs. Add status message about SCOREC_NO_MPI flag. Signed-off-by: Aiden Woodruff <[email protected]>
Signed-off-by: Aiden Woodruff <[email protected]>
Replace MPI_Comm_free with PCU_Comm_Free_One. Signed-off-by: Aiden Woodruff <[email protected]>
Signed-off-by: Aiden Woodruff <[email protected]>
Signed-off-by: Aiden Woodruff <[email protected]>
This will avoid object collisions when linking to libraries that already use MPI. - i.e. the intended use case for SCOREC_NO_MPI. Signed-off-by: Aiden Woodruff <[email protected]>
Changed add_nompi_msg to record receiver. - Since there's no MPI, sender == receiver but I want to quiet the warnings. Signed-off-by: Aiden Woodruff <[email protected]>
Add return type for pcu_pmpi_free, pcu_pmpi_split. Signed-off-by: Aiden Woodruff <[email protected]>
Add linked list traversal to find a matching message. Removed error message that did not actually indicate errors. - If no message is found it's perfectly reasonable and in fact correct to return 0 and that's how pmpi knows to return false. Signed-off-by: Aiden Woodruff <[email protected]>
Run smoke tests without mpirun given SCOREC_NO_MPI. Add SCOREC_NO_MPI to the GitHub Actions test matrix. Signed-off-by: Aiden Woodruff <[email protected]>
Signed-off-by: Aiden Woodruff <[email protected]>
I use clock_gettime, which is one of the many methods mpich uses depending on the system. Signed-off-by: Aiden Woodruff <[email protected]>
Also update TriBITS package file. Signed-off-by: Aiden Woodruff <[email protected]>
- Add mpi_test_depends function to replace set_test_properties. This function checks if SCOREC_NO_MPI is defined and assumes nonexistent tests were multiproc ones. Signed-off-by: Aiden Woodruff <[email protected]>
Signed-off-by: Aiden Woodruff <[email protected]>
- parma/group/parma_group.cc(runInGroups): remove PCU/MPI free that will be carried out by pcu::PCU::~PCU. - test/xgc_split.cc: remove direct mpi.h inclusion - test/pumiLoadMesh.cc: remove direct mpi.h inclusion. - test/modelInfo.cc: remove direct #include <mpi.h>. - phasta/phstream.cc(::getTime): use new pcu::Time. - test/fieldReduce.cc: use correct PCUObj.Peers. - pumi/pumi_sys.cc(pumi_sync): use PCU Barrier instead of MPI Barrier. - pcu/PCU.h: make PCU_Comm (holdover) functions extern C and PCU_Wtime. - pumi/pumi_ghost.cc(do_off_part_bridge): fix typo (forgot parens). - pumi/pumi_mesh.cc: capitalize GetMPIComm. - pumi/pumi_numbering(pumi_numbering_print): use pcu::PCU::Barrier instead of WORLD barrier. - pcu/PCU_C.h: optionally include mpi or stub. - pcu/pcu_c.cc: add split stub and include pcu_pmpi.h. - pcu/pcu_pmpi.c: remove leftover conflict markers. - pcu/pcu_pnompi.h: update function signatures, make extern C, and remove obsolete stubs. - pcu/pcu_pnompi.c: remove globals and use pcu_mpi_t*. - clean up formatting. - remove unused or (void)ed argument names. Signed-off-by: Aiden Woodruff <[email protected]>
- pcu/PCU.cc(PCU::~PCU): clean up message order if it exists (valgrind is happy now). Signed-off-by: Aiden Woodruff <[email protected]>
- pcu/pcu_pmpi.c: remove pcu_pmpi_switch, pcu_pmpi_comm which were used by old nompi stubs. - (pcu_pmpi_allgather, pcu_pmpi_allreduce): break long lines and clean up indentation. Signed-off-by: Aiden Woodruff <[email protected]>
I compared the CTest suite resultant meshes in pumi-meshes and build/test/ and they are identical except for build/test/mis_test. However, MIS uses random numbers, so I am figuring they are otherwise identical. @onkarsahni said:
|
- example/mpi-nompi/test_coll.cc: add new executable to run the same tests as hello world but with assertions. - example/mpi-nompi/CMakeLists.txt: use new test_coll which allows the removal of regexes and no temporal non-determinism. Signed-off-by: Aiden Woodruff <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly focused on the changes in the C++ and CMake code and did not review the C implementation in detail.
I have one major concern that we need to clarify before merging is a more concrete understanding of the semantics of the class's comm ownership. I'm currently against a shared ownership model for this particular case. And, highly against anything that puts the onus of remembering to clean things up on the user. Is there a simple case that can demonstrate why you needed to add the ownscomm
logic?
The other more minor concern is that we should get rid of the need for #ifdefs in the user code by providing our own init
and finalize
routines. That will substantially clean up the executables.
The rest is very minor stuff that should be quick to update.
Otherwise, I think this is very much moving in the right direction. The changes clean up a fair number of places where we have a mix of MPI and PCU use which is bound to cause issues.
Signed-off-by: Aiden Woodruff <[email protected]>
- example/mpi-nompi/hello.cc: use std::vector instead of new/delete. - replace print_array with print_vector. - example/mpi-nompi/test_coll.cc: use std::vector instead of new/delete. Signed-off-by: Aiden Woodruff <[email protected]>
Signed-off-by: Aiden Woodruff <[email protected]>
I was considering adding a specialized Outside of something like that, what is so bad about having #ifdefs? When we discussed, we thought it would help to make it clear what is going on to people looking at tests as examples. |
Are the |
@cwsmith, the places with |
The thing about ifdefs is it can make the code much more unreadable and the user needs to know under the hood whether they should have MPI or not. If we wrap those ifdefs in our own initialize function then we take care of that subtlety for them.
I don't like this solution. The reason we have the PCU object is that we may have many of them lying around with different communicators under the hood, but ultimately still using the same MPI library. I have use cases where this is needed. That constructor means the PCU object may or may not own the MPI initialization. What happens when the owning PCU goes out of scope? All other PCUs that are still alive would break as they would be doing MPI operations after So, there needs to be some library/MPI initialization at a higher level than the PCU object. |
Shouldn't the world-initializing PCU only go out of scope at the end of the main function? Shouldn't all other objects have been destructed by then? |
I am not touching apfCGNS code yet because pcgnslib.h has |
We determined that instead of |
This reverts commit d150eb6.
- pcu/PCU.h: add pcu::PCU_Init and finalize declarations. - pcu/PCU.cc: add definitions. - test/*.cc: replace MPI_Init with pcu::PCU_Init and finalize - phasta/*.cc: replace MPI_Init with PCU_Init. Signed-off-by: Aiden Woodruff <[email protected]>
- example/mpi-nompi: use static pcu object instead of dynamically allocating. - test/cap*.cc: use static pcu object instead of dynamic allocation. Signed-off-by: Aiden Woodruff <[email protected]>
- example/mpi-nompi/hello.cc: update call using PCUObj address. Signed-off-by: Aiden Woodruff <[email protected]>
- pcu/PCU.h: include <memory> and change PCU::Split signature to return std::unique_ptr. - pcu/PCU.cc: return unique_ptr in PCU::Split. - parma/group/parma_group.cc: update PCU::Split call. - phasta/phCook.cc: update to use unique_ptr. remove delete. - pumi/pumi_mesh.cc: update to use unique_ptr and remove delete. - test/repartition.cc: update to use unique_ptr. - test/split.cc: inline getGroupedPCU() and update to use unique_ptr. - test/ugrid.cc: inline getGroupedPCU() and update to use unique_ptr. - test/zsplit.cc: inline getGroupedPCU() and update to use unique_ptr. Signed-off-by: Aiden Woodruff <[email protected]>
- pcu/PCU.h: add PCU::DupComm. - pcu/PCU.cc: add DupComm definition. - pcu/pcu_mpi.[ch]: add pcu_mpi layer. - pcu/pcu_pmpi.[ch]: add pcu_pmpi layer. - pcu/pcu_pnompi.c: add stub pcu_pmpi_dup implementation. Signed-off-by: Aiden Woodruff <[email protected]>
SCOREC_NO_MPI is also likely exclusive with ENABLE_ZOLTAN since zoltan.h has |
Hmmm. Zoltan can be used locally on each process (see the MPI_COMM = SELF) to do what we called 'local part splitting' but that use case then relies on migrating elements to other processes via ... MPI. So, yes, |
For when I update apfZoltanCallbacks.cc, Zoltan duplicates the comm, so I should free my reference after Zoltan create. |
- pcu/PCU.h: remove GetMPIComm, GetComm, SwitchMPIComm, SwitchComm, OwnsComm, OwnsComm(bool). - (PCU::Split): change return type to int and set output pointer. - update documentation. - pcu/PCU.cc: remove GetMPIComm, GetComm, SwitchMPIComm, SwitchComm, OwnsComm. - (PCU::Split): remove OwnsComm call. - (PCU::DupComm): return int and set pointer. - pcu/pcu_mpi.h: remove pcu_mpi_struct original_comm and owned. - pcu/pcu_pmpi.c (pcu_pmpi_init): do not copy original comm and remove self->owned. - (pcu_pmpi_finalize): remove MPI_Comm_free for owned original_comm. - (pcu_pmpi_split): split over user_comm. - pcu/pcu_pnompi.c: update init and finalize. - pcu/PCU_C.h: remove Switch_Comm and Get_Comm. - add Comm_Dup and Comm_Split. - pcu/pcu_c.cc: remove Switch_Comm and Get_Comm. add Comm_Dup with pointer to newpcu that releases the unique_ptr and Comm_Split. Signed-off-by: Aiden Woodruff <[email protected]>
- CMakeLists.txt: add error when SCOREC_NO_MPI and ENABLE_CGNS. - zoltan/CMakeLists.txt: add error when SCOREC_NO_MPI and ENABLE_ZOLTAN. Signed-off-by: Aiden Woodruff <[email protected]>
- zoltan/apfZoltanCallbacks.cc: use DupComm and free our duplicate after Zoltan_Create. Signed-off-by: Aiden Woodruff <[email protected]>
- pumi/pumi_mesh.cc (pumi_mesh_load): switch back to original PCU object instead of split (fixing error introduced by 46be470). Signed-off-by: Aiden Woodruff <[email protected]>
@cwsmith, I'm running into a subtle bug that I'm hoping you can help with. The test case New code: Lines 213 to 241 in 0069474
Old code: Lines 234 to 260 in a900c47
|
Add nompi stub
Add a stub implementation of MPI in
pcu/pcu_pnompi.c
.Remove direct calls to MPI functions.
This adds a
SCOREC_NO_MPI
build option for users running on single-process, single-machine setups that do not want to compile with MPI. Message passing is simulated in memory.