Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change record & replay relationship between queue / graph #58

Merged
merged 5 commits into from
Jan 10, 2023

Conversation

EwanC
Copy link
Collaborator

@EwanC EwanC commented Dec 15, 2022

Motvated by a desire for better alignment with the queue record graph creation mechanism in the kernel fusion extension. See #53

ext::codeplay::experimental::fusion_wrapper w{q};
w.start_fusion();
// 'q' submissions
w.complete_fusion();

By changing the relationship between a queue and a graph so that recording starts and finishes on a graph we better match the above kernel fusion model. This design is also more exception safe as command_graph::end_recording() can be called in a RAII approach when a graph is destroyed.

As a result, a graph is now created from queue recording like:

ext::oneapi::experimental::command_graph graph;
graph.begin_recording({q});  // used to be q.begin_recording(graph) before this change
// 'q' submissions
graph.end_recording();    // used to be q.end_recording() before this change

@EwanC EwanC added the Graph Specification Extension Specification related label Dec 15, 2022
@EwanC EwanC requested review from reble, Bensuo and julianmi December 15, 2022 14:25
@EwanC EwanC force-pushed the ewan/record_replay_relationship branch from 8326ed0 to 990c6ee Compare December 15, 2022 14:31
@EwanC EwanC changed the title Change record & replay queue / graph relationship Change record & replay relationship between queue / graph Dec 15, 2022
Copy link
Collaborator

@Bensuo Bensuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just some minor comments.

@EwanC EwanC force-pushed the ewan/record_replay_relationship branch from 990c6ee to eda3422 Compare December 19, 2022 14:47
@EwanC EwanC force-pushed the ewan/record_replay_relationship branch from 41f114f to 222d412 Compare December 20, 2022 15:18
@EwanC EwanC requested a review from Bensuo December 20, 2022 15:19
Copy link
Collaborator

@Bensuo Bensuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just one nitpick!

EwanC and others added 5 commits December 21, 2022 11:36
Better aligns the queue record graph creation mechansism
with the [kernel fusion
extension](intel#7098)

```cpp
ext::codeplay::experimental::fusion_wrapper w{q};
w.start_fusion();
// 'q' submissions
w.complete_fusion()
```

By changing the relationship between a queue and a graph so
that recording starts and finishes on a graph we better match
kernel fusion. This design is also more exception safe as
`end_recording()` can be called in a RAII approach when a graph
is destroyed.

As a result a graph is now created from queue recording like:
```cpp
ext::oneapi::experimental::command_graph graph;
graph.begin_recording({q});
// 'q' submissions
graph.end_recording();
```

Addresses Issue #53
This specifies the behaviour on destruction of a modifiable
`command_graph` to end recording of queues which are recording
to the graph.
Overload the `command_graph::begin_recording()` and
`command_graph::end_recording()` functions with variants for both
a single queue and a list of queues.
Co-authored-by: Ben Tracy <[email protected]>
@EwanC EwanC force-pushed the ewan/record_replay_relationship branch from 3f26115 to 778112b Compare December 21, 2022 11:41
EwanC added a commit that referenced this pull request Jan 5, 2023
This is based on
[feedback](intel#5626 (comment)) which points out that
[Section
6.3.2](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_names_for_extensions_to_existing_classes_or_enumerations)
of the SYCL spec says to use a vendor prefix for new functions to
existing classes.

I've not updated the `queue::begin_recording` and `queue::end_recording`
entry points with this convention, as they will be removed in
PR #58
EwanC added a commit that referenced this pull request Jan 9, 2023
This is based on
[feedback](intel#5626 (comment)) which points out that
[Section
6.3.2](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_names_for_extensions_to_existing_classes_or_enumerations)
of the SYCL spec says to use a vendor prefix for new functions to
existing classes.

I've not updated the `queue::begin_recording` and `queue::end_recording`
entry points with this convention, as they will be removed in
PR #58
@EwanC EwanC merged commit d49571e into sycl-graph-update Jan 10, 2023
reble pushed a commit that referenced this pull request May 15, 2023
…callback

The `TypeSystemMap::m_mutex` guards against concurrent modifications
of members of `TypeSystemMap`. In particular, `m_map`.

`TypeSystemMap::ForEach` iterates through the entire `m_map` calling
a user-specified callback for each entry. This is all done while
`m_mutex` is locked. However, there's nothing that guarantees that
the callback itself won't call back into `TypeSystemMap` APIs on the
same thread. This lead to double-locking `m_mutex`, which is undefined
behaviour. We've seen this cause a deadlock in the swift plugin with
following backtrace:

```

int main() {
    std::unique_ptr<int> up = std::make_unique<int>(5);

    volatile int val = *up;
    return val;
}

clang++ -std=c++2a -g -O1 main.cpp

./bin/lldb -o “br se -p return” -o run -o “v *up” -o “expr *up” -b
```

```
frame #4: std::lock_guard<std::mutex>::lock_guard
frame #5: lldb_private::TypeSystemMap::GetTypeSystemForLanguage <<<< Lock #2
frame #6: lldb_private::TypeSystemMap::GetTypeSystemForLanguage
frame #7: lldb_private::Target::GetScratchTypeSystemForLanguage
...
frame #26: lldb_private::SwiftASTContext::LoadLibraryUsingPaths
frame #27: lldb_private::SwiftASTContext::LoadModule
frame #30: swift::ModuleDecl::collectLinkLibraries
frame #31: lldb_private::SwiftASTContext::LoadModule
frame #34: lldb_private::SwiftASTContext::GetCompileUnitImportsImpl
frame #35: lldb_private::SwiftASTContext::PerformCompileUnitImports
frame #36: lldb_private::TypeSystemSwiftTypeRefForExpressions::GetSwiftASTContext
frame #37: lldb_private::TypeSystemSwiftTypeRefForExpressions::GetPersistentExpressionState
frame #38: lldb_private::Target::GetPersistentSymbol
frame #41: lldb_private::TypeSystemMap::ForEach                 <<<< Lock #1
frame #42: lldb_private::Target::GetPersistentSymbol
frame #43: lldb_private::IRExecutionUnit::FindInUserDefinedSymbols
frame #44: lldb_private::IRExecutionUnit::FindSymbol
frame #45: lldb_private::IRExecutionUnit::MemoryManager::GetSymbolAddressAndPresence
frame #46: lldb_private::IRExecutionUnit::MemoryManager::findSymbol
frame #47: non-virtual thunk to lldb_private::IRExecutionUnit::MemoryManager::findSymbol
frame #48: llvm::LinkingSymbolResolver::findSymbol
frame #49: llvm::LegacyJITSymbolResolver::lookup
frame #50: llvm::RuntimeDyldImpl::resolveExternalSymbols
frame #51: llvm::RuntimeDyldImpl::resolveRelocations
frame #52: llvm::MCJIT::finalizeLoadedModules
frame #53: llvm::MCJIT::finalizeObject
frame #54: lldb_private::IRExecutionUnit::ReportAllocations
frame #55: lldb_private::IRExecutionUnit::GetRunnableInfo
frame #56: lldb_private::ClangExpressionParser::PrepareForExecution
frame #57: lldb_private::ClangUserExpression::TryParse
frame #58: lldb_private::ClangUserExpression::Parse
```

Our solution is to simply iterate over a local copy of `m_map`.

**Testing**

* Confirmed on manual reproducer (would reproduce 100% of the time
  before the patch)

Differential Revision: https://reviews.llvm.org/D149949
@EwanC EwanC deleted the ewan/record_replay_relationship branch June 28, 2023 14:36
aarongreig pushed a commit that referenced this pull request Sep 24, 2024
Currently, process of replacing bitwise operations consisting of
`LSR`/`LSL` with `And` is performed by `DAGCombiner`.

However, in certain cases, the `AND` generated by this process
can be removed.

Consider following case:
```
        lsr x8, x8, #56
        and x8, x8, #0xfc
        ldr w0, [x2, x8]
        ret
```

In this case, we can remove the `AND` by changing the target of `LDR`
to `[X2, X8, LSL #2]` and right-shifting amount change to 56 to 58.

after changed:
```
        lsr x8, x8, #58
        ldr w0, [x2, x8, lsl #2]
        ret
```

This patch checks to see if the `SHIFTING` + `AND` operation on load
target can be optimized and optimizes it if it can.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Graph Specification Extension Specification related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants