Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce really rough linalg matrix proposal #404

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 110 additions & 0 deletions proposals/NNNN-linalg-matrix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
<!-- {% raw %} -->

# Linear Algebra Matrix

* Proposal: [NNNN](NNNN-linalg-matrix.md)
* Author(s): [Chris Bieneman](https://github.com/llvm-beanz)
* Sponsor: TBD
* Status: **Under Consideration**

## Introduction

GPUs are exceptional parallel data processors, but increasingly it is becoming
important to model operations with cross-thread data dependencies. In HLSL and
Direct3D these operations have been called Wave operations, or in Vulkan
Subgroup operations. Related terms like Quad or derivatives have similar
meaning in different scoping contexts. Vulkan has also recently introduced the
term "cooperative" when talking about operations that require participation from
multiple threads, these can be viewed much like derivative operations but across
the full SIMD unit instead of a subset of threads.

All of these terms refer to the way the underlying instructions execute, not
necessarily what they do. One big part of this proposal is to take 5 steps back
and talk about what they do: linear algebra.

## Motivation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


HLSL has a Vulkan extension for SIMD matrix types [0021 - vk::Cooperative
Matrix](0021-vk-coop-matrix.md), and DirectX had previewed a similar feature in
SM 6.8 called [Wave Matrix](https://github.com/microsoft/hlsl-specs/pull/61).
This proposal is aimed at merging the two into a unified language feature that
can be supported on all platforms (with some platform-specific limitations).

## Proposed solution

Below is a proposed pseudo-HLSL API. The proposal uses C++20 concepts to
represent template type constraints so as to avoid needing SFINAE complications.

```c++
namespace hlsl {

template <class T>
concept ArithmeticScalar = std::is_arithmetic<T>::value;

namespace linalg {

template <typename ComponentTy, uint M, uint N>
requires ArithmeticScalar<ComponentTy>
class Matrix {
template <typename NewCompTy> Matrix<NewCompTy, M, N> cast();

Matrix operator+(Matrix);
Matrix operator-(Matrix);
Matrix operator*(Matrix);
Matrix operator/(Matrix);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does operator/ do on a matrix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elementwise division, will add comments

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the operators are element wise aligning with our conventions for vector and matrix types.


template <typename T>
requires ArithmeticScalar<T>
Matrix operator+(T);
template <typename T>
requires ArithmeticScalar<T>
Matrix operator-(T);
template <typename T>
requires ArithmeticScalar<T>
Matrix operator*(T);
template <typename T>
requires ArithmeticScalar<T>
Matrix operator/(T);
Comment on lines +56 to +67
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these element-wise?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I'll add comments. This aligns with our convention for matrix and vector operators for other builtin types


static Matrix Splat(ElTy Val);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static Matrix Splat(ElTy Val);
static Matrix Splat(ComponentTy Val);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I suspect that this code might have used all of T, ElTy and ComponentTy at some point to refer to the same thing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... yea I crossed some template wires. Let me do fix that.

static Matrix Load(ByteAddressBuffer Res, uint StartOffset, uint Stride,
bool ColMajor, uint Align = sizeof(ComponentTy));
static Matrix Load(RWByteAddressBuffer Res, uint StartOffset, uint Stride,
bool ColMajor, uint Align = sizeof(ComponentTy));

static Matrix Load(groupshared ElTy Arr[], uint StartIdx, uint Stride,
bool ColMajor);

void Store(RWByteAddressBuffer Res, uint StartOffset, uint Stride,
bool ColMajor, uint Align = sizeof(ComponentTy));

void Store(groupshared ElTy Arr[], uint StartIdx, uint Stride, bool ColMajor);

void MultiplyAccumulate(const ref Matrix<T, N, M>);
void SumAccumulate(const ref Matrix<T, N, M>);
};

template <typename T, uint M, uint N, uint K>
Matrix<T, M, N> Multiply(const ref Matrix<T, M, K>, const ref Matrix<T, K, N>);

} // namespace linalg
} // namespace hlsl
```

## Detailed design

## Outstanding Questions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Support for packed types?
  • Support for other number formats that aren't natively supported by HLSL?


* Do we need a "scope" parameter or is it reasonable to assume Subgroup scope
for all operations at least for an initial feature?
* Do we need the usage to be part of the type?
* Vulkan has a "use" template parameter, which serves a similar purpose to the
D3D WaveMatrix "Left" and "Right" types. The compiler should be able to
detect the usage and introduce memory shuffling automatically (with
potential performance impact).
* Do we need element-wise accessors? The Vulkan extension doesn't support
element manipulations, but this has been identified as an important feature?
* What will the DXIL representation look like?
* This will be addressed in a separate proposal.

<!-- {% endraw %} -->