Change record & replay relationship between queue / graph #58

EwanC · 2022-12-15T14:25:44Z

Motvated by a desire for better alignment with the queue record graph creation mechanism in the kernel fusion extension. See #53

ext::codeplay::experimental::fusion_wrapper w{q};
w.start_fusion();
// 'q' submissions
w.complete_fusion();

By changing the relationship between a queue and a graph so that recording starts and finishes on a graph we better match the above kernel fusion model. This design is also more exception safe as command_graph::end_recording() can be called in a RAII approach when a graph is destroyed.

As a result, a graph is now created from queue recording like:

ext::oneapi::experimental::command_graph graph;
graph.begin_recording({q});  // used to be q.begin_recording(graph) before this change
// 'q' submissions
graph.end_recording();    // used to be q.end_recording() before this change

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc

Bensuo

LGTM just some minor comments.

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc

Bensuo

LGTM just one nitpick!

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc

Better aligns the queue record graph creation mechansism with the [kernel fusion extension](intel#7098) ```cpp ext::codeplay::experimental::fusion_wrapper w{q}; w.start_fusion(); // 'q' submissions w.complete_fusion() ``` By changing the relationship between a queue and a graph so that recording starts and finishes on a graph we better match kernel fusion. This design is also more exception safe as `end_recording()` can be called in a RAII approach when a graph is destroyed. As a result a graph is now created from queue recording like: ```cpp ext::oneapi::experimental::command_graph graph; graph.begin_recording({q}); // 'q' submissions graph.end_recording(); ``` Addresses Issue #53

This specifies the behaviour on destruction of a modifiable `command_graph` to end recording of queues which are recording to the graph.

Overload the `command_graph::begin_recording()` and `command_graph::end_recording()` functions with variants for both a single queue and a list of queues.

Co-authored-by: Ben Tracy <[email protected]>

This is based on [feedback](intel#5626 (comment)) which points out that [Section 6.3.2](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_names_for_extensions_to_existing_classes_or_enumerations) of the SYCL spec says to use a vendor prefix for new functions to existing classes. I've not updated the `queue::begin_recording` and `queue::end_recording` entry points with this convention, as they will be removed in PR #58

…callback The `TypeSystemMap::m_mutex` guards against concurrent modifications of members of `TypeSystemMap`. In particular, `m_map`. `TypeSystemMap::ForEach` iterates through the entire `m_map` calling a user-specified callback for each entry. This is all done while `m_mutex` is locked. However, there's nothing that guarantees that the callback itself won't call back into `TypeSystemMap` APIs on the same thread. This lead to double-locking `m_mutex`, which is undefined behaviour. We've seen this cause a deadlock in the swift plugin with following backtrace: ``` int main() { std::unique_ptr<int> up = std::make_unique<int>(5); volatile int val = *up; return val; } clang++ -std=c++2a -g -O1 main.cpp ./bin/lldb -o “br se -p return” -o run -o “v *up” -o “expr *up” -b ``` ``` frame #4: std::lock_guard<std::mutex>::lock_guard frame #5: lldb_private::TypeSystemMap::GetTypeSystemForLanguage <<<< Lock #2 frame #6: lldb_private::TypeSystemMap::GetTypeSystemForLanguage frame #7: lldb_private::Target::GetScratchTypeSystemForLanguage ... frame #26: lldb_private::SwiftASTContext::LoadLibraryUsingPaths frame #27: lldb_private::SwiftASTContext::LoadModule frame #30: swift::ModuleDecl::collectLinkLibraries frame #31: lldb_private::SwiftASTContext::LoadModule frame #34: lldb_private::SwiftASTContext::GetCompileUnitImportsImpl frame #35: lldb_private::SwiftASTContext::PerformCompileUnitImports frame #36: lldb_private::TypeSystemSwiftTypeRefForExpressions::GetSwiftASTContext frame #37: lldb_private::TypeSystemSwiftTypeRefForExpressions::GetPersistentExpressionState frame #38: lldb_private::Target::GetPersistentSymbol frame #41: lldb_private::TypeSystemMap::ForEach <<<< Lock #1 frame #42: lldb_private::Target::GetPersistentSymbol frame #43: lldb_private::IRExecutionUnit::FindInUserDefinedSymbols frame #44: lldb_private::IRExecutionUnit::FindSymbol frame #45: lldb_private::IRExecutionUnit::MemoryManager::GetSymbolAddressAndPresence frame #46: lldb_private::IRExecutionUnit::MemoryManager::findSymbol frame #47: non-virtual thunk to lldb_private::IRExecutionUnit::MemoryManager::findSymbol frame #48: llvm::LinkingSymbolResolver::findSymbol frame #49: llvm::LegacyJITSymbolResolver::lookup frame #50: llvm::RuntimeDyldImpl::resolveExternalSymbols frame #51: llvm::RuntimeDyldImpl::resolveRelocations frame #52: llvm::MCJIT::finalizeLoadedModules frame #53: llvm::MCJIT::finalizeObject frame #54: lldb_private::IRExecutionUnit::ReportAllocations frame #55: lldb_private::IRExecutionUnit::GetRunnableInfo frame #56: lldb_private::ClangExpressionParser::PrepareForExecution frame #57: lldb_private::ClangUserExpression::TryParse frame #58: lldb_private::ClangUserExpression::Parse ``` Our solution is to simply iterate over a local copy of `m_map`. **Testing** * Confirmed on manual reproducer (would reproduce 100% of the time before the patch) Differential Revision: https://reviews.llvm.org/D149949

Currently, process of replacing bitwise operations consisting of `LSR`/`LSL` with `And` is performed by `DAGCombiner`. However, in certain cases, the `AND` generated by this process can be removed. Consider following case: ``` lsr x8, x8, #56 and x8, x8, #0xfc ldr w0, [x2, x8] ret ``` In this case, we can remove the `AND` by changing the target of `LDR` to `[X2, X8, LSL #2]` and right-shifting amount change to 56 to 58. after changed: ``` lsr x8, x8, #58 ldr w0, [x2, x8, lsl #2] ret ``` This patch checks to see if the `SHIFTING` + `AND` operation on load target can be optimized and optimizes it if it can.

EwanC added the Graph Specification Extension Specification related label Dec 15, 2022

EwanC requested review from reble, Bensuo and julianmi December 15, 2022 14:25

EwanC force-pushed the ewan/record_replay_relationship branch from 8326ed0 to 990c6ee Compare December 15, 2022 14:31

EwanC commented Dec 15, 2022

View reviewed changes

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc Outdated Show resolved Hide resolved

EwanC changed the title ~~Change record & replay queue / graph relationship~~ Change record & replay relationship between queue / graph Dec 15, 2022

Bensuo approved these changes Dec 15, 2022

View reviewed changes

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc Outdated Show resolved Hide resolved

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc Outdated Show resolved Hide resolved

sommerlukas reviewed Dec 16, 2022

View reviewed changes

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc Show resolved Hide resolved

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc Outdated Show resolved Hide resolved

EwanC force-pushed the ewan/record_replay_relationship branch from 990c6ee to eda3422 Compare December 19, 2022 14:47

EwanC commented Dec 19, 2022

View reviewed changes

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc Show resolved Hide resolved

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc Outdated Show resolved Hide resolved

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc Show resolved Hide resolved

Bensuo approved these changes Dec 20, 2022

View reviewed changes

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc Outdated Show resolved Hide resolved

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc Outdated Show resolved Hide resolved

EwanC force-pushed the ewan/record_replay_relationship branch from 41f114f to 222d412 Compare December 20, 2022 15:18

EwanC requested a review from Bensuo December 20, 2022 15:19

Bensuo approved these changes Dec 21, 2022

View reviewed changes

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc Outdated Show resolved Hide resolved

EwanC and others added 5 commits December 21, 2022 11:36

Add section on exception safety

2dfe1e0

This specifies the behaviour on destruction of a modifiable `command_graph` to end recording of queues which are recording to the graph.

Provide multiple overloads for begin/end recording

559c7f6

Overload the `command_graph::begin_recording()` and `command_graph::end_recording()` functions with variants for both a single queue and a list of queues.

Make wording consistent

88633e6

Co-authored-by: Ben Tracy <[email protected]>

Update after rebase on minor spec clarifications

778112b

EwanC force-pushed the ewan/record_replay_relationship branch from 3f26115 to 778112b Compare December 21, 2022 11:41

EwanC mentioned this pull request Jan 4, 2023

[SYCL][DOC] Graph API extensions intel/llvm#5626

Merged

EwanC mentioned this pull request Jan 5, 2023

Change handler::graph() to handler::ext_oneapi_graph() #65

Merged

reble approved these changes Jan 6, 2023

View reviewed changes

EwanC merged commit d49571e into sycl-graph-update Jan 10, 2023

EwanC deleted the ewan/record_replay_relationship branch June 28, 2023 14:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change record & replay relationship between queue / graph #58

Change record & replay relationship between queue / graph #58

EwanC commented Dec 15, 2022 •

edited

Loading

Bensuo left a comment

Bensuo left a comment

Change record & replay relationship between queue / graph #58

Change record & replay relationship between queue / graph #58

Conversation

EwanC commented Dec 15, 2022 • edited Loading

Bensuo left a comment

Choose a reason for hiding this comment

Bensuo left a comment

Choose a reason for hiding this comment

EwanC commented Dec 15, 2022 •

edited

Loading