-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Lang] Revisit memory model #321
base: main
Are you sure you want to change the base?
Changes from 7 commits
a483c45
0040940
1ce6657
809bbc5
e66e423
2398305
d89823f
5f062b8
0516676
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -273,21 +273,46 @@ | |||||
|
||||||
\Sec{\acrshort{hlsl} Memory Models}{Intro.Memory} | ||||||
|
||||||
\p Memory accesses for \gls{sm} 5.0 and earlier operate on 128-bit slots aligned | ||||||
on 128-bit boundaries. This optimized for the common case in early shaders where | ||||||
data being processed on the GPU was usually 4-element vectors of 32-bit data | ||||||
types. | ||||||
|
||||||
\p On modern hardware memory access restrictions are loosened, and reads of | ||||||
32-bit multiples are supported starting with \gls{sm} 5.1 and reads of 16-bit | ||||||
multiples are supported with \gls{sm} 6.0. \gls{sm} features are fully | ||||||
documented in the \gls{dx} Specifications, and this document will not attempt to | ||||||
elaborate further. | ||||||
\p The fundamental storage unit in HLSL is a \textit{byte}, which is comprised | ||||||
of 8 \textit{bits}. Each \textit{bit} stores a single value 0 or 1. Each byte | ||||||
has a unique \textit{memory location}, alternatively called an \textit{address}. | ||||||
|
||||||
\p Each read or write to a memory location is called a \textit{memory access}. | ||||||
Operations that perform memory accesses are called \textit{memory operations}. A | ||||||
memory operation may operate on one or more memory locations. A memory operation | ||||||
must not alter memory at a location not contained in the set of memory locations it | ||||||
is operating on\footnote{Two subtle notes here: (1) A bit-field's memory location | ||||||
includes adjacent bit-fields, so reads and writes to bit-fields are expected to | ||||||
read and write adjacent memory if they're within the same set of locations, (2) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To me the usage of 'same set of locations' is a bit ambiguous. After a few reads I assume it means the adjacent bit-fields. Maybe something like:
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Maybe a caveat is needed to specify the adjacent bitfields' memory locations, do not ALSO include their adjacent bitfields. |
||||||
padding bits inside a structure are included in the memory location of the | ||||||
structure. Reads and writes of uninitialized memory is undefined, but a write is | ||||||
allowed to stomp over padding.}. | ||||||
|
||||||
\p Two sets of memory locations, \texttt{A} and \texttt{B}, are said to | ||||||
\textit{overlap} each other if some memory location in \texttt{A} is also in | ||||||
\texttt{B} (\(A \cap B \neq \emptyset\)). | ||||||
|
||||||
\Sub{Memory Spaces}{Intro.Memory.Spaces} | ||||||
|
||||||
\p \acrshort{hlsl} programs manipulate data stored in four distinct memory | ||||||
spaces: thread, threadgroup, device and constant. | ||||||
spaces: thread, threadgroup, device and constant. Memory spaces are logical | ||||||
abstractions over physical memory. Each memory space has a defined \textit{line | ||||||
width}, which specifies the minimum readable and writable size, and a | ||||||
\textit{minimum alignment}, which defines the smallest addressable increment of | ||||||
the memory space. The two values need not be the same, although they may be. | ||||||
|
||||||
\begin{note} | ||||||
\p Memory accesses for many resource types in \gls{dx} operate on 128-bit | ||||||
slots aligned on 128-bit boundaries. In the terms of this specification it | ||||||
would be said that those memory spaces have a 128-bit \textit{line width}, | ||||||
and a 128-bit \textit{minimum alignment}. | ||||||
\end{note} | ||||||
Comment on lines
+329
to
+339
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are a few constraints around memory accesses in HLSL and DXIL that you're trying to abstract over here, but I'm not sure the "line width" idea captures them effectively. In some sense it might seem nice to boil down some similar rules into a simple concept, but it's worth noting why the rules are what they are and how they might change.
So I guess TLDR I think we should simply say two separate things rather than trying to define "line width":
Also note that I use "constant buffer" memory in my wording above, rather than "constant memory". We may want to keep that terminology available for if we ever do something in that space that doesn't carry the constant buffer legacy. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree. I think part of the problem is trying to view all resource accesses as if they are like native memory accesses from the shader, with only the address space placing constraints on alignment and such. I think we can evolve constant memory and raw/structured buffer memory in this direction, but not typed/texture accesses. For memory that goes in this direction, I don't think "line width" would be a concept we want to use/keep, and "minimum alignment" will be defined in other ways, rather than by some fixed value applied to a memory type. Some notes:
I don't think I would agree with that. First, it's a confusing use of the term "element" here. Perhaps you had a different definition of "element" in mind than what I am interpreting here, but I struggle to think of a single definition that fits into this statement. Plenty of elements of structures and arrays in HLSL that exceed 128-bits in size can be placed into the constant buffer. You can declare a double4 (or array of such) in a constant buffer, which will use two rows for the vector. It's just that structures, array elements, and any type that cannot fit within the remainder of a row will be started at the beginning of the next available row. For some of these, that's part of the high-level packing rules, not necessarily something intrinsic to the DXIL interface. For array elements, they must be 128-bit aligned to ensure that array indexing maps to an index in the DXIL legacy constant buffer load op without impacting the index of the component read from the result. For legacy constant buffer load in DXIL, it's important to note that this load op doesn't mean all of the components are loaded - only the components that are extracted from the result structure need to be loaded. It's a subtle difference, but important in certain circumstances, and mismatches the concept of "line width" as applied to constant buffers. Think of the DXIL op as a compromise as there wasn't an easy way to express the thing that's expressed easily in DXBC asm like so: |
||||||
|
||||||
\p A memory location in any space may overlap with another memory location in | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this mean you have say a MAU of 4 bytes, but you can have a memory location that accesses at byte 0 and a different memory location that accesses at byte 2? |
||||||
the same space. A memory location in thread or threadgroup memory may not | ||||||
overlap with memory locations in any other memory spaces. It is implementation | ||||||
defined if memory locations in other memory spaces alias with memory locations | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sentence here makes me wonder if I'm interpreting the first sentence in this paragraph correctly. Edit: it didn't tag the line correctly. I mean line 314. |
||||||
in different spaces. | ||||||
|
||||||
\SubSub{Thread Memory}{Intro.Memory.Spaces.Thread} | ||||||
|
||||||
|
@@ -319,3 +344,9 @@ | |||||
\gls{lane}s executing on the device. Constant memory is read-only, and an | ||||||
implementation can assume that constant memory is immutable and cannot change | ||||||
during execution. | ||||||
|
||||||
\SubSub{Constant Memory}{Intro.Memory.Spaces.Overlap} | ||||||
|
||||||
\p The \textbf{Thread} and \textbf{Thread Group} memory spaces may not overlap | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this paragraph just stating part of what was said in the paragraph starting on line 311? |
||||||
with any other memory space. All addresses in either memory space are implied to | ||||||
not alias any address in any other memory space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining the commonly understood term
memory access granularity
and specifying the access granularity to be abyte
along with subsequent usage of the term may be a better option than using "bit-fields".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit-fields are a specific language structure that has unique properties in the memory model because of their unique packing behavior.