Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU][docs] Replace gfx940 and gfx941 with gfx942 in llvm/docs #126887

Merged
merged 1 commit into from
Feb 19, 2025

Conversation

ritter-x2a
Copy link
Member

gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all documentation occurrences of gfx940/gfx941 except
for the gfx940 ISA description, which will be the subject of a separate
PR.

For SWDEV-512631

@llvmbot
Copy link
Member

llvmbot commented Feb 12, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Fabian Ritter (ritter-x2a)

Changes

gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all documentation occurrences of gfx940/gfx941 except
for the gfx940 ISA description, which will be the subject of a separate
PR.

For SWDEV-512631


Patch is 24.54 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/126887.diff

2 Files Affected:

  • (modified) llvm/docs/AMDGPUOperandSyntax.rst (+2-2)
  • (modified) llvm/docs/AMDGPUUsage.rst (+32-65)
diff --git a/llvm/docs/AMDGPUOperandSyntax.rst b/llvm/docs/AMDGPUOperandSyntax.rst
index ff6ec6cf71ff2..e8a76322fe76a 100644
--- a/llvm/docs/AMDGPUOperandSyntax.rst
+++ b/llvm/docs/AMDGPUOperandSyntax.rst
@@ -63,7 +63,7 @@ Note: *N* and *K* must satisfy the following conditions:
 * 0 <= *K* <= 255.
 * *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.
 
-GFX90A and GFX940 have an additional alignment requirement:
+GFX90A and GFX942 have an additional alignment requirement:
 pairs of *vector* registers must be even-aligned
 (first register must be even).
 
@@ -183,7 +183,7 @@ Note: *N* and *K* must satisfy the following conditions:
 * 0 <= *K* <= 255.
 * *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.
 
-GFX90A and GFX940 have an additional alignment requirement:
+GFX90A and GFX942 have an additional alignment requirement:
 pairs of *accumulator* registers must be even-aligned
 (first register must be even).
 
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 83ec1eecb6e5e..14b3b6fce9e70 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -323,7 +323,7 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following
                                                                                                         Add product
                                                                                                         names.
 
-     **GCN GFX9 (Vega)** [AMD-GCN-GFX900-GFX904-VEGA]_ [AMD-GCN-GFX906-VEGA7NM]_ [AMD-GCN-GFX908-CDNA1]_ [AMD-GCN-GFX90A-CDNA2]_ [AMD-GCN-GFX940-GFX942-CDNA3]_
+     **GCN GFX9 (Vega)** [AMD-GCN-GFX900-GFX904-VEGA]_ [AMD-GCN-GFX906-VEGA7NM]_ [AMD-GCN-GFX908-CDNA1]_ [AMD-GCN-GFX90A-CDNA2]_ [AMD-GCN-GFX942-CDNA3]_
      -----------------------------------------------------------------------------------------------------------------------
      ``gfx900``                  ``amdgcn``   dGPU  - xnack           - Absolute      - *rocm-amdhsa* - Radeon Vega
                                                                         flat          - *pal-amdhsa*    Frontier Edition
@@ -378,20 +378,6 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following
                                                                                                       - Ryzen 3 Pro 4350G
                                                                                                       - Ryzen 3 Pro 4350GE
 
-     ``gfx940``                  ``amdgcn``   dGPU  - sramecc         - Architected                   *TBA*
-                                                    - tgsplit           flat
-                                                    - xnack             scratch                       .. TODO::
-                                                    - kernarg preload - Packed
-                                                                        work-item                       Add product
-                                                                        IDs                             names.
-
-     ``gfx941``                  ``amdgcn``   dGPU  - sramecc         - Architected                   *TBA*
-                                                    - tgsplit           flat
-                                                    - xnack             scratch                       .. TODO::
-                                                    - kernarg preload - Packed
-                                                                        work-item                       Add product
-                                                                        IDs                             names.
-
      ``gfx942``                  ``amdgcn``   dGPU  - sramecc         - Architected                   - AMD Instinct MI300X
                                                     - tgsplit           flat                          - AMD Instinct MI300A
                                                     - xnack             scratch
@@ -583,10 +569,10 @@ Generic processor code objects are versioned. See :ref:`amdgpu-generic-processor
                                                                                                   - ``v_dot2_f32_f16``
 
 
-     ``gfx9-4-generic``   ``amdgcn``     - ``gfx940``      - sramecc          - Architected     FP8 and BF8 instructions,
-                                         - ``gfx941``      - tgsplit            flat scratch    FP8 and BF8 conversion
-                                         - ``gfx942``      - xnack            - Packed          instructions, as well as
-                                         - ``gfx950``      - kernarg preload    work-item       instructions with XF32 format
+     ``gfx9-4-generic``   ``amdgcn``     - ``gfx942``      - sramecc          - Architected     FP8 and BF8 instructions,
+                                         - ``gfx950``      - tgsplit            flat scratch    FP8 and BF8 conversion
+                                                           - xnack            - Packed          instructions, as well as
+                                                           - kernarg preload    work-item       instructions with XF32 format
                                                                                 IDs             support are not available.
 
      ``gfx10-1-generic``  ``amdgcn``     - ``gfx1010``     - xnack            - Absolute flat   - The following instructions are
@@ -4974,7 +4960,7 @@ The fields used by CP for code objects before V3 also match those specified in
              bytes
      383:352 4 bytes COMPUTE_PGM_RSRC3               GFX6-GFX9
                                                        Reserved, must be 0.
-                                                     GFX90A, GFX940
+                                                     GFX90A, GFX942
                                                        Compute Shader (CS)
                                                        program settings used by
                                                        CP to set up
@@ -5059,7 +5045,7 @@ The fields used by CP for code objects before V3 also match those specified in
      463:460 4 bits                                  Reserved, must be 0.
      470:464 7 bits  KERNARG_PRELOAD_SPEC_LENGTH     GFX6-GFX9
                                                        - Reserved, must be 0.
-                                                     GFX90A, GFX940
+                                                     GFX90A, GFX942
                                                        - The number of dwords from
                                                          the kernarg segment to preload
                                                          into User SGPRs before kernel
@@ -5067,7 +5053,7 @@ The fields used by CP for code objects before V3 also match those specified in
                                                          :ref:`amdgpu-amdhsa-kernarg-preload`).
      479:471 9 bits  KERNARG_PRELOAD_SPEC_OFFSET     GFX6-GFX9
                                                        - Reserved, must be 0.
-                                                     GFX90A, GFX940
+                                                     GFX90A, GFX942
                                                        - An offset in dwords into the
                                                          kernarg segment to begin
                                                          preloading data into User
@@ -5093,7 +5079,7 @@ The fields used by CP for code objects before V3 also match those specified in
                                                      GFX6-GFX9
                                                        - vgprs_used 0..256
                                                        - max(0, ceil(vgprs_used / 4) - 1)
-                                                     GFX90A, GFX940
+                                                     GFX90A, GFX942
                                                        - vgprs_used 0..512
                                                        - vgprs_used = align(arch_vgprs, 4)
                                                                       + acc_vgprs
@@ -5559,7 +5545,7 @@ The fields used by CP for code objects before V3 also match those specified in
 
 ..
 
-  .. table:: compute_pgm_rsrc3 for GFX90A, GFX940
+  .. table:: compute_pgm_rsrc3 for GFX90A, GFX942
      :name: amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table
 
      ======= ======= =============================== ===========================================================================
@@ -9970,15 +9956,15 @@ only accessed by a single thread, and is always write-before-read, there is
 never a need to invalidate these entries from the L1 cache. Hence all cache
 invalidates are done as ``*_vol`` to only invalidate the volatile cache lines.
 
-The code sequences used to implement the memory model for GFX940, GFX941, GFX942
-are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx941-gfx942-table`.
+The code sequences used to implement the memory model for GFX942 are defined in
+table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx942-table`.
 
-  .. table:: AMDHSA Memory Model Code Sequences GFX940, GFX941, GFX942
-     :name: amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx941-gfx942-table
+  .. table:: AMDHSA Memory Model Code Sequences GFX942
+     :name: amdgpu-amdhsa-memory-model-code-sequences-gfx942-table
 
      ============ ============ ============== ========== ================================
      LLVM Instr   LLVM Memory  LLVM Memory    AMDGPU     AMDGPU Machine Code
-                  Ordering     Sync Scope     Address    GFX940, GFX941, GFX942
+                  Ordering     Sync Scope     Address    GFX942
                                               Space
      ============ ============ ============== ========== ================================
      **Non-Atomic**
@@ -10013,18 +9999,12 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9
      load         *none*       *none*         - local    1. ds_load
      store        *none*       *none*         - global   - !volatile & !nontemporal
                                               - generic
-                                              - private    1. GFX940, GFX941
+                                              - private    1. GFX942
                                               - constant        buffer/global/flat_store
-                                                                sc0=1 sc1=1
-                                                              GFX942
-                                                                buffer/global/flat_store
 
                                                          - !volatile & nontemporal
 
-                                                           1. GFX940, GFX941
-                                                                buffer/global/flat_store
-                                                                nt=1 sc0=1 sc1=1
-                                                              GFX942
+                                                           1. GFX942
                                                                 buffer/global/flat_store
                                                                 nt=1
 
@@ -10696,11 +10676,8 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9
 
      **Release Atomic**
      ------------------------------------------------------------------------------------
-     store atomic release      - singlethread - global   1. GFX940, GFX941
+     store atomic release      - singlethread - global   1. GFX942
                                - wavefront    - generic       buffer/global/flat_store
-                                                              sc0=1 sc1=1
-                                                            GFX942
-                                                              buffer/global/flat_store
 
      store atomic release      - singlethread - local    *If TgSplit execution mode,
                                - wavefront               local address space cannot
@@ -10738,10 +10715,7 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9
                                                              store that is being
                                                              released.
 
-                                                         2. GFX940, GFX941
-                                                              buffer/global/flat_store
-                                                              sc0=1 sc1=1
-                                                            GFX942
+                                                         2. GFX942
                                                               buffer/global/flat_store
                                                               sc0=1
      store atomic release      - workgroup    - local    *If TgSplit execution mode,
@@ -10802,10 +10776,7 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9
                                                              store that is being
                                                              released.
 
-                                                         3. GFX940, GFX941
-                                                              buffer/global/flat_store
-                                                              sc0=1 sc1=1
-                                                            GFX942
+                                                         3. GFX942
                                                               buffer/global/flat_store
                                                               sc1=1
      store atomic release      - system       - global   1. buffer_wbl2 sc0=1 sc1=1
@@ -17563,11 +17534,7 @@ in this description.
 
     CDNA 2        :doc:`GFX9<AMDGPU/AMDGPUAsmGFX9>`             :doc:`gfx90a<AMDGPU/AMDGPUAsmGFX90a>`
 
-    CDNA 3        :doc:`GFX9<AMDGPU/AMDGPUAsmGFX9>`             :doc:`gfx940<AMDGPU/AMDGPUAsmGFX940>`
-
-                                                                :doc:`gfx941<AMDGPU/AMDGPUAsmGFX940>`
-
-                                                                :doc:`gfx942<AMDGPU/AMDGPUAsmGFX940>`
+    CDNA 3        :doc:`GFX9<AMDGPU/AMDGPUAsmGFX9>`             :doc:`gfx942<AMDGPU/AMDGPUAsmGFX940>`
 
     RDNA 1        :doc:`GFX10 RDNA1<AMDGPU/AMDGPUAsmGFX10>`     :doc:`gfx1010<AMDGPU/AMDGPUAsmGFX10>`
 
@@ -17605,7 +17572,7 @@ combinations of operands, refer to one of instruction set architecture manuals
 [AMD-GCN-GFX6]_, [AMD-GCN-GFX7]_, [AMD-GCN-GFX8]_,
 [AMD-GCN-GFX900-GFX904-VEGA]_, [AMD-GCN-GFX906-VEGA7NM]_,
 [AMD-GCN-GFX908-CDNA1]_, [AMD-GCN-GFX90A-CDNA2]_,
-[AMD-GCN-GFX940-GFX942-CDNA3]_, [AMD-GCN-GFX10-RDNA1]_, [AMD-GCN-GFX10-RDNA2]_,
+[AMD-GCN-GFX942-CDNA3]_, [AMD-GCN-GFX10-RDNA1]_, [AMD-GCN-GFX10-RDNA2]_,
 [AMD-GCN-GFX11-RDNA3]_ and [AMD-GCN-GFX11-RDNA3.5]_.
 
 Operands
@@ -18118,7 +18085,7 @@ terminated by an ``.end_amdhsa_kernel`` directive.
                                                                                                :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`
      ``.amdhsa_user_sgpr_private_segment_buffer``             0                   GFX6-GFX10   Controls ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER in
                                                                                   (except      :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
-                                                                                  GFX940)
+                                                                                  GFX942)
      ``.amdhsa_user_sgpr_dispatch_ptr``                       0                   GFX6-GFX12   Controls ENABLE_SGPR_DISPATCH_PTR in
                                                                                                :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
      ``.amdhsa_user_sgpr_queue_ptr``                          0                   GFX6-GFX12   Controls ENABLE_SGPR_QUEUE_PTR in
@@ -18129,7 +18096,7 @@ terminated by an ``.end_amdhsa_kernel`` directive.
                                                                                                :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
      ``.amdhsa_user_sgpr_flat_scratch_init``                  0                   GFX6-GFX10   Controls ENABLE_SGPR_FLAT_SCRATCH_INIT in
                                                                                   (except      :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
-                                                                                  GFX940)
+                                                                                  GFX942)
      ``.amdhsa_user_sgpr_private_segment_size``               0                   GFX6-GFX12   Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in
                                                                                                :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
      ``.amdhsa_wavefront_size32``                             Target              GFX10-GFX12  Controls ENABLE_WAVEFRONT_SIZE32 in
@@ -18140,8 +18107,8 @@ terminated by an ``.end_amdhsa_kernel`` directive.
                                                                                                :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
      ``.amdhsa_system_sgpr_private_segment_wavefront_offset`` 0                   GFX6-GFX10   Controls ENABLE_PRIVATE_SEGMENT in
                                                                                   (except      :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
-                                                                                  GFX940)
-     ``.amdhsa_enable_private_segment``                       0                   GFX940,      Controls ENABLE_PRIVATE_SEGMENT in
+                                                                                  GFX942)
+     ``.amdhsa_enable_private_segment``                       0                   GFX942,      Controls ENABLE_PRIVATE_SEGMENT in
                                                                                   GFX11-GFX12  :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
      ``.amdhsa_system_sgpr_workgroup_id_x``                   1                   GFX6-GFX12   Controls ENABLE_SGPR_WORKGROUP_ID_X in
                                                                                                :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
@@ -18162,14 +18129,14 @@ terminated by an ``.end_amdhsa_kernel`` directive.
                                                                                                Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
                                                                                                :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
      ``.amdhsa_accum_offset``                                 Required            GFX90A,      Offset of a first AccVGPR in the unified register file.
-                                                                                  GFX940       Used to calculate ACCUM_OFFSET in
+                                                                                  GFX942       Used to calculate ACCUM_OFFSET in
                                                                                                :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table`.
      ``.amdhsa_reserve_vcc``                                  1                   GFX6-GFX12   Whether the kernel may use the special VCC SGPR.
                                                                                                Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
                                                                                                :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
      ``.amdhsa_reserve_flat_scratch``                         1                   GFX7-GFX10   Whether the kernel may use flat instructions to access
                                                                          ...
[truncated]

@ritter-x2a ritter-x2a marked this pull request as ready for review February 12, 2025 10:55
@ritter-x2a ritter-x2a force-pushed the users/ritter-x2a/rm-gfx940-gfx941-featureforecestoresc0sc1 branch from 79c331e to baa8add Compare February 13, 2025 09:31
@ritter-x2a ritter-x2a force-pushed the users/ritter-x2a/rm-gfx940-gfx941-docs branch 2 times, most recently from 6bc2689 to a5d359b Compare February 13, 2025 14:57
@ritter-x2a ritter-x2a force-pushed the users/ritter-x2a/rm-gfx940-gfx941-featureforecestoresc0sc1 branch from baa8add to 731f633 Compare February 13, 2025 14:57
@ritter-x2a ritter-x2a force-pushed the users/ritter-x2a/rm-gfx940-gfx941-docs branch from a5d359b to ca4a620 Compare February 18, 2025 08:22
@ritter-x2a ritter-x2a force-pushed the users/ritter-x2a/rm-gfx940-gfx941-featureforecestoresc0sc1 branch from 731f633 to 05da561 Compare February 18, 2025 08:22
Copy link
Member Author

ritter-x2a commented Feb 19, 2025

Merge activity

  • Feb 19, 3:48 AM EST: A user started a stack merge that includes this pull request via Graphite.
  • Feb 19, 4:29 AM EST: Graphite rebased this pull request as part of a merge.
  • Feb 19, 4:31 AM EST: A user merged this pull request with Graphite.

@ritter-x2a ritter-x2a force-pushed the users/ritter-x2a/rm-gfx940-gfx941-featureforecestoresc0sc1 branch from 05da561 to 4e77aa4 Compare February 19, 2025 09:24
Base automatically changed from users/ritter-x2a/rm-gfx940-gfx941-featureforecestoresc0sc1 to main February 19, 2025 09:26
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all documentation occurrences of gfx940/gfx941 except
for the gfx940 ISA description, which will be the subject of a separate
PR.

For SWDEV-512631
@ritter-x2a ritter-x2a force-pushed the users/ritter-x2a/rm-gfx940-gfx941-docs branch from ca4a620 to a0b0253 Compare February 19, 2025 09:29
@ritter-x2a ritter-x2a merged commit db59708 into main Feb 19, 2025
6 of 8 checks passed
@ritter-x2a ritter-x2a deleted the users/ritter-x2a/rm-gfx940-gfx941-docs branch February 19, 2025 09:31
Prakhar-Dixit pushed a commit to Prakhar-Dixit/llvm-project that referenced this pull request Feb 19, 2025
…vm#126887)

gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all documentation occurrences of gfx940/gfx941 except
for the gfx940 ISA description, which will be the subject of a separate
PR.

For SWDEV-512631
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants