-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Fix reductions not working inside graph #50
Conversation
- Graph submission now properly creates a host visible event on the command list allowing auxilliary resources to be cleaned up
@Bensuo Great! Thanks for fixing that. We should just add a note with a TODO to remove the code duplication. Now I'm wondering why we should not just call The idea was to decouple closing and executing the CommandList. But that is not what is happening now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work figuring this out ⭐
Think we should merge this to fix current issues, and then try to rework as Pablo suggests as a follow-up
So I've been looking into this idea and I think I have something along those lines. I think we only need to modify the initial check in I'll push it as an update to this PR but can always roll back to the already approved duplication approach if need be. |
- Remove code duplication in piQueueFlush in favour of calling executeCommandList - executeCommandList slightly modified to block execution only for command lists not allowed to be batched.
…callback The `TypeSystemMap::m_mutex` guards against concurrent modifications of members of `TypeSystemMap`. In particular, `m_map`. `TypeSystemMap::ForEach` iterates through the entire `m_map` calling a user-specified callback for each entry. This is all done while `m_mutex` is locked. However, there's nothing that guarantees that the callback itself won't call back into `TypeSystemMap` APIs on the same thread. This lead to double-locking `m_mutex`, which is undefined behaviour. We've seen this cause a deadlock in the swift plugin with following backtrace: ``` int main() { std::unique_ptr<int> up = std::make_unique<int>(5); volatile int val = *up; return val; } clang++ -std=c++2a -g -O1 main.cpp ./bin/lldb -o “br se -p return” -o run -o “v *up” -o “expr *up” -b ``` ``` frame #4: std::lock_guard<std::mutex>::lock_guard frame #5: lldb_private::TypeSystemMap::GetTypeSystemForLanguage <<<< Lock #2 frame #6: lldb_private::TypeSystemMap::GetTypeSystemForLanguage frame #7: lldb_private::Target::GetScratchTypeSystemForLanguage ... frame #26: lldb_private::SwiftASTContext::LoadLibraryUsingPaths frame #27: lldb_private::SwiftASTContext::LoadModule frame #30: swift::ModuleDecl::collectLinkLibraries frame #31: lldb_private::SwiftASTContext::LoadModule frame #34: lldb_private::SwiftASTContext::GetCompileUnitImportsImpl frame #35: lldb_private::SwiftASTContext::PerformCompileUnitImports frame #36: lldb_private::TypeSystemSwiftTypeRefForExpressions::GetSwiftASTContext frame #37: lldb_private::TypeSystemSwiftTypeRefForExpressions::GetPersistentExpressionState frame #38: lldb_private::Target::GetPersistentSymbol frame #41: lldb_private::TypeSystemMap::ForEach <<<< Lock #1 frame #42: lldb_private::Target::GetPersistentSymbol frame #43: lldb_private::IRExecutionUnit::FindInUserDefinedSymbols frame #44: lldb_private::IRExecutionUnit::FindSymbol frame #45: lldb_private::IRExecutionUnit::MemoryManager::GetSymbolAddressAndPresence frame #46: lldb_private::IRExecutionUnit::MemoryManager::findSymbol frame #47: non-virtual thunk to lldb_private::IRExecutionUnit::MemoryManager::findSymbol frame #48: llvm::LinkingSymbolResolver::findSymbol frame #49: llvm::LegacyJITSymbolResolver::lookup frame #50: llvm::RuntimeDyldImpl::resolveExternalSymbols frame #51: llvm::RuntimeDyldImpl::resolveRelocations frame #52: llvm::MCJIT::finalizeLoadedModules frame #53: llvm::MCJIT::finalizeObject frame #54: lldb_private::IRExecutionUnit::ReportAllocations frame #55: lldb_private::IRExecutionUnit::GetRunnableInfo frame #56: lldb_private::ClangExpressionParser::PrepareForExecution frame #57: lldb_private::ClangUserExpression::TryParse frame #58: lldb_private::ClangUserExpression::Parse ``` Our solution is to simply iterate over a local copy of `m_map`. **Testing** * Confirmed on manual reproducer (would reproduce 100% of the time before the patch) Differential Revision: https://reviews.llvm.org/D149949
Graph submission now properly creates a host visible event on the command list allowing auxilliary resources to be cleaned up.
Code is taken from_pi_queue::executeCommandList
, copy pasting a decent amount of code like that is not ideal but there isn't an immediately obvious solution to let us actually use that function when flushing.Updated to remove code duplication, instead relying on the
OKtoBatchCommand
parameter toexecuteCommandList
to let us determine when to actually submit the command list and when to block execution.This will at least make the examples that use reductions run, though it doesn't solve related problems with buffers and other regular operations on the queue.
Closes #24