Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Memory leaks inside RCPs #413

Open
vovannikov opened this issue Feb 20, 2025 · 4 comments
Open

[BUG] Memory leaks inside RCPs #413

vovannikov opened this issue Feb 20, 2025 · 4 comments

Comments

@vovannikov
Copy link
Contributor

vovannikov commented Feb 20, 2025

Description

There are memory leaks in the new structure solver. The debug version of Trilinos clearly reveals it, see the attached log file. valgrind also detects them, see another log.

I looked at those location indicated by Trilinos, and it seems the leaks are related for those objects which get added to a shared ParametersList in multiple places. At least, the pattern preceding the appearances of this leaks is always the same: something gets plugged into an arbitrary branch of a large params list. I assume, there is a cycle dependency appearing somewhere. I could not identify the source of the problem with a glance look at the code since the hierarchy between the shared pointers is actually quite complicated + a lot of objects behaving like global variables. I can try to debug it a bit further.

valgrind --leak-check=full --log-file=valgrind.log \
    ~/development/4C-debug/4C ~/development/4C/tests/input_files/structure_new_nln_solver_singlestep_cube_linear.dat test1

valgrind.log
4c-log.txt

Steps to reproduce

  1. Build debug version of Trilinos
  2. Build debug version of 4C and link it against the debug version of Trilinos
  3. Execute the command above
@amgebauer
Copy link
Member

Thanks for reporting the issue. That reminds me of #63

@vovannikov
Copy link
Contributor Author

Just another small remark, I think we should not abuse ParametersList for transferring objects all over the code, this is nasty. The pattern like this appears in multiple places:

// get a mutable reference to the linear solver parameter list
Teuchos::ParameterList& p_linsolver = const_cast<Teuchos::ParameterList&>(
pnox.sublist("Direction").sublist("Newton").sublist("Linear Solver"));
NOX::Nln::LinSystem::PrePostOperator::Map& prepostlinsystem_map =
NOX::Nln::LinSystem::PrePostOp::get_map(p_linsolver);
// create the new pre/post operator for the nox nln linear system
Teuchos::RCP<NOX::Nln::Abstract::PrePostOperator> prepostdbc_ptr =
Teuchos::make_rcp<NOX::Nln::LinSystem::PrePostOp::Dbc>(Teuchos::rcpFromRef(*this));
// insert/replace the old pointer in the map
prepostlinsystem_map[NOX::Nln::LinSystem::prepost_dbc] = prepostdbc_ptr;

@sebproell
Copy link
Member

FYI, I have been trying to reduce the number of RCPs/shared_ptrs inside 4C for precisely this reason. And fully agree on the ParameterList thing. It's really meant for input and should not be used as dynamic container to make bad inheritance work, see #61.

@vovannikov
Copy link
Contributor Author

Actually, the code example above is exactly a typical example that creates a cyclic dependency, we need just to a few more lines above to recognize it:

const Teuchos::ParameterList& pnox = timint_ptr_->get_data_sdyn().get_nox_params();
if (pnox.sublist("Direction").isSublist("Newton"))
{
if (pnox.sublist("Direction").sublist("Newton").isSublist("Linear Solver"))
{
// get a mutable reference to the linear solver parameter list
Teuchos::ParameterList& p_linsolver = const_cast<Teuchos::ParameterList&>(
pnox.sublist("Direction").sublist("Newton").sublist("Linear Solver"));
NOX::Nln::LinSystem::PrePostOperator::Map& prepostlinsystem_map =
NOX::Nln::LinSystem::PrePostOp::get_map(p_linsolver);
// create the new pre/post operator for the nox nln linear system
Teuchos::RCP<NOX::Nln::Abstract::PrePostOperator> prepostdbc_ptr =
Teuchos::make_rcp<NOX::Nln::LinSystem::PrePostOp::Dbc>(Teuchos::rcpFromRef(*this));
// insert/replace the old pointer in the map
prepostlinsystem_map[NOX::Nln::LinSystem::prepost_dbc] = prepostdbc_ptr;
}

timint_ptr_ is a shared pointer inside Solid::Dbc, it contains the ParametersList, to which we ultimately store the RCP to the this object itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants