forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc++] Set feature-test macro __cpp_lib_atomic_float
(#127559)
#1
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The corresponding feature was implemented in LLVM 18 (by llvm#67799), but this FTM wasn't added before.
It is always safe to add poison-generating flags for `BinOp Y, Identity`. Proof: https://alive2.llvm.org/ce/z/8BLEpq and https://alive2.llvm.org/ce/z/584Bb4 Then we can propagate flags from one of the arms: ``` select Cond, Y, (BinOp flags Y, Z) -> select Cond, (BinOp flags Y, Identity), (BinOp flags Y, Z) -> BinOp flags Y, (select Cond, Identity, Z) ``` This patch is proposed to avoid information loss caused by llvm#127390.
Some check labels in the basic.ll test are simply f[1-3], which matches all cases where the function entry is hot, cold, or uncategorized. Since the actual test results for each label have mixed cases, the current labels can be considered ambiguous. This patch specifies exact matches for each label to ensure more precise validation and to prevent potential regressions in the future.
The `-fcf-protection` flag is now also used to enable CFI features for the RISC-V target, so it's not suitable to define `__CET__` solely based on the flag anymore. This patch moves the definition of the `__CET__` macro into X86 target hook, so only X86 targets with the `-fcf-protection` flag would enable the `__CET__` macro. See llvm#109784 and llvm#112477 for the adoption of `-fcf-protection` flag for RISC-V targets.
This change allows array variables to copy-initialize from other arrays. It also corrects a small error in HLSL C-Style casting that did not error on casting to arrays if elementwise and splat conversions fail. Fixes llvm#127551
) Lack of these entries caused clang to crash on the following code: ```c __m256bh fun(__m256bh arg) { return arg; } __m256bh run() { __m256bh arg= {0}; fun(arg); } ``` It caused the FastISel to fail since it handled the call lowering basing on the X86CallingConv table. Curiously, if FastISel fails somewhere down the line and selectionDAGISel fallbacks, the crash does not occur. Following code _does not_ crash: ```c __m256bh fun(__m256bh arg) { return arg; } __m256bh run() { __m256bh arg= {0}; return fun(arg); } ``` This is puzzling to me. Obviously, if FastISel fails then compiler fallbacks to something else to lower these calls -- but since the X86callingConv table _doesn't_ have entries for vNbf16 how does this other thing manage not to crash? It has to use some other mechanism, one which doesn't use the table. This rises following questions: - how is this lowering accomplished without, presumably, using the CallingConv entries? - why is the table not used? I mean this points to some logic duplication (fastISel way vs. the other bug-free way) - How to properly test this? There is a test for vNbf16 values, but it also must not be using the FastISel path? This duplication of logic makes it hard to test this, since we don't have direct control whether the FastISel path or the other one is used. Nonetheless, this PR fixes the crash, though I didn't create a test for it, since I am unsure yet how it should look like. I would like to learn how the working non-FastISel mechanism works; I tried looking for it, but didn't yet manage to find anything
…amdgpu-sw-lower-lds. (llvm#121214) "infer-address-spaces" pass replaces all refinable generic pointers with equivalent specific pointers. At -O0 optimisation level, infer-address-spaces pass doesn't run in the pipeline. "amdgpu-sw-lower-lds" pass instruments memory operations on addrspace(3) ptrs. Since, extra addrspacecasts are present from lds to flat addrspaces at -O0 and the actual store/load memory instructions are now on flat addrspace, these addrspacecast need to be handled in the amdgpu-sw-lower-lds pass itself. This patch lowers the lds ptr first to the corresponding ptr in the global memory from the asan_malloc. Then replaces the original cast with addrspacecast from global ptr to flat ptr.
…7626) isOperandLegal mostly implemented the VALU operand rules, and largely ignored SALU restrictions. This theoretically avoids folding literals into SALU insts which already have a literal operand. This issue is currently avoided due to a bug in SIFoldOperands; this change will allow using raw operand legality rules. This breaks the formation of s_fmaak_f32 in SIFoldOperands, but it probably should not have been forming there in the first place. TwoAddressInsts or RA should generally handle that, and this only worked by accident.
…n `v_pk_mov_b32` (llvm#127715) This PR serves as a preliminary step, adding a test case for register coalescer on v_pk_mov_b32. It is intended to demonstrate the code changes introduced in an upcoming PR.
…option (llvm#127442) In SPGO lto mode, linker needs -lto-sample-profile option to set sample profile file. Linux adds this option by transferring fprofile-sample-use to -plugin-opt=sample-profile=, which is alias of lto-sample-profile. (in clang\lib\Driver\ToolChains\CommonArgs.cpp: tools::addLTOOptions()). But clang on Windows misses the transferring. So add it now.
For function declarations (i.e. func op has no entry block), the FunctionOpInterface method `insertArgument` and `eraseArgument` will cause segfault. This PR guards against manipulation of empty entry block by checking whether func op is external. An example can be seen in google/heir#1324 The segfault trace ``` #1 0x0000560f1289d9db PrintStackTraceSignalHandler(void*) /proc/self/cwd/external/llvm-project/llvm/lib/Support/Unix/Signals.inc:874:1 llvm#2 0x0000560f1289b116 llvm::sys::RunSignalHandlers() /proc/self/cwd/external/llvm-project/llvm/lib/Support/Signals.cpp:105:5 llvm#3 0x0000560f1289e145 SignalHandler(int) /proc/self/cwd/external/llvm-project/llvm/lib/Support/Unix/Signals.inc:415:1 llvm#4 0x00007f829a3d9520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520) llvm#5 0x0000560f1257f8bc void __gnu_cxx::new_allocator<mlir::BlockArgument>::construct<mlir::BlockArgument, mlir::BlockArgument>(mlir::BlockArgument*, mlir::BlockArgument&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/ext/new_allocator.h:162:23 llvm#6 0x0000560f1257f84d void std::allocator_traits<std::allocator<mlir::BlockArgument> >::construct<mlir::BlockArgument, mlir::BlockArgument>(std::allocator<mlir::BlockArgument>&, mlir::BlockArgument*, mlir::BlockArgument&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:520:2 llvm#7 0x0000560f12580498 void std::vector<mlir::BlockArgument, std::allocator<mlir::BlockArgument> >::_M_insert_aux<mlir::BlockArgument>(__gnu_cxx::__normal_iterator<mlir::BlockArgument*, std::vector<mlir::BlockArgument, std::allocator<mlir::BlockArgument> > >, mlir::BlockArgument&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/vector.tcc:405:7 llvm#8 0x0000560f1257cf7e std::vector<mlir::BlockArgument, std::allocator<mlir::BlockArgument> >::insert(__gnu_cxx::__normal_iterator<mlir::BlockArgument const*, std::vector<mlir::BlockArgument, std::allocator<mlir::BlockArgument> > >, mlir::BlockArgument const&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/vector.tcc:154:6 llvm#9 0x0000560f1257b349 mlir::Block::insertArgument(unsigned int, mlir::Type, mlir::Location) /proc/self/cwd/external/llvm-project/mlir/lib/IR/Block.cpp:178:13 llvm#10 0x0000560f123d2a1c mlir::function_interface_impl::insertFunctionArguments(mlir::FunctionOpInterface, llvm::ArrayRef<unsigned int>, mlir::TypeRange, llvm::ArrayRef<mlir::DictionaryAttr>, llvm::ArrayRef<mlir::Location>, unsigned int, mlir::Type) /proc/self/cwd/external/llvm-project/mlir/lib/Interfaces/FunctionInterfaces.cpp:232:11 llvm#11 0x0000560f0be6b727 mlir::detail::FunctionOpInterfaceTrait<mlir::func::FuncOp>::insertArguments(llvm::ArrayRef<unsigned int>, mlir::TypeRange, llvm::ArrayRef<mlir::DictionaryAttr>, llvm::ArrayRef<mlir::Location>) /proc/self/cwd/bazel-out/k8-dbg/bin/external/llvm-project/mlir/include/mlir/Interfaces/FunctionInterfaces.h.inc:809:7 llvm#12 0x0000560f0be6b536 mlir::detail::FunctionOpInterfaceTrait<mlir::func::FuncOp>::insertArgument(unsigned int, mlir::Type, mlir::DictionaryAttr, mlir::Location) /proc/self/cwd/bazel-out/k8-dbg/bin/external/llvm-project/mlir/include/mlir/Interfaces/FunctionInterfaces.h.inc:796:7 ```
Co-authored-by: Matt Arsenault <[email protected]>
…llvm#127063) recent change e76739e has exposed use after free in GetCursorSource() function that returned pointer to a disposed CXString
…obind`s `stubgen` (llvm#127584) This PR makes several improvements to the stubs that are created by `mlir_(attribute|type|value)_subclass`. First, the PR sets the `__module__` attribute of the classes generated by the nanobind adaptors for attributes, types, and values (via `mlir_(attribute|type|value)_subclass`). By default, the `__module__` property is set to `importlib._bootstrap`, which isn't where we want the new class to live. The new logic sets the property to the name of the module provided as `scope` instead. This also makes nanobind's `stubgen` generate stubs for those classes properly, which ignores classes whose `__module__` does not correspond to the module it is generating stubs for. This resolves llvm#127518. Second, the PR overwrites the function signatures generated by `stubgen` to a format that uses the desired type names (e.g., `mlir.ir.Attribute` instead of `MlirAttribute`). Finally, the PR piggy-backs some minor doc and style improvements to `PythonAdaptors.h`. --------- Signed-off-by: Ingo Müller <[email protected]>
This patch fixes: clang/tools/c-index-test/c-index-test.c:1240:15: error: mixing declarations and code is a C99 extension [-Werror,-Wdeclaration-after-statement] clang/tools/c-index-test/c-index-test.c:1367:14: error: mixing declarations and code is a C99 extension [-Werror,-Wdeclaration-after-statement] clang/tools/c-index-test/c-index-test.c:1468:14: error: mixing declarations and code is a C99 extension [-Werror,-Wdeclaration-after-statement]
This is a (no-op) locale version of strftime.
…#127640) CaptureTracking considers insertions into aggregates and vectors as captures. As such, extractions from aggregates and vectors are escape sources. A non-escaping identified local cannot alias with the result of an extractvalue/extractelement. Fixes llvm#126670.
…27705) This patch adds handling of the RISCVISD::VCPOP_VL node in RISCVTargetLowering::computeKnownBitsForTargetNode. It eliminates redundant zero-extension instructions.
…atization (llvm#127754) Issue: Compilation abnormally terminates in parallel default(private) Documentation reference: A threadprivate variable must not appear as the base variable of a list item in any clause except for the copyin and copyprivate clauses Explanation: From the reference, the threadprivate symbols cannot be used in the DSA clauses, which in turn means, the symbol can be skipped for default privatization Fixes llvm#123535
…lvm#125826) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631 and SWDEV-512633
…m#127455) Delete `equivalenceAnalysis`, which has been incorporated into the `getAliasingValues` API. Also add an additional test case to ensure that equivalence is properly propagated across function boundaries.
…5836) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631
…patterns. (llvm#127643) Handles both BWI and non-BWI cases (skips PMOV*XBW without BWI). The vector-interleaved-store-i8-stride-8.ll VPTERNLOG diffs are due to better value tracking now recognizing the zero-extension patterns where before it was any-extension
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[libc++] Set feature-test macro
__cpp_lib_atomic_float
(llvm#127559)The corresponding feature was implemented in LLVM 18 (by llvm#67799), but
this FTM wasn't added before.
[InstCombine] Propagate flags in
foldSelectICmpAndBinOp
(llvm#127437)It is always safe to add poison-generating flags for
BinOp Y, Identity
.Proof: https://alive2.llvm.org/ce/z/8BLEpq
and https://alive2.llvm.org/ce/z/584Bb4
Then we can propagate flags from one of the arms:
This patch is proposed to avoid information loss caused by
llvm#127390.
[PGO][test] Specify exact match for check labels (llvm#117376)
Some check labels in the basic.ll test are simply f[1-3], which matches
all cases where the function entry is hot, cold, or uncategorized. Since
the actual test results for each label have mixed cases, the current
labels can be considered ambiguous.
This patch specifies exact matches for each label to ensure more precise
validation and to prevent potential regressions in the future.
[lld][WebAssembly] Fix warnings in test. NFC (llvm#127714)
[clang][X86] Only define CET macro for X86 targets (llvm#127616)
The
-fcf-protection
flag is now also used to enable CFI features forthe RISC-V target, so it's not suitable to define
__CET__
solely basedon the flag anymore. This patch moves the definition of the
__CET__
macro into X86 target hook, so only X86 targets with the
-fcf-protection
flag would enable the__CET__
macro.See llvm#109784 and
llvm#112477 for the adoption
of
-fcf-protection
flag for RISC-V targets.[HLSL] Allow arrays to copy-initialize (llvm#127557)
This change allows array variables to copy-initialize from other arrays.
It also corrects a small error in HLSL C-Style casting that did not
error on casting to arrays if elementwise and splat conversions fail.
Fixes llvm#127551
[X86] Add missing vNbf16 handling in X86CallingConv.td file (llvm#127102)
Lack of these entries caused clang to crash on the following code:
It caused the FastISel to fail since it handled the call lowering basing
on the X86CallingConv table.
Curiously, if FastISel fails somewhere down the line and
selectionDAGISel fallbacks, the crash does not occur. Following code
does not crash:
This is puzzling to me. Obviously, if FastISel fails then compiler
fallbacks to something else to lower these calls -- but since the
X86callingConv table doesn't have entries for vNbf16 how does this
other thing manage not to crash? It has to use some other mechanism, one
which doesn't use the table. This rises following questions:
CallingConv entries?
duplication (fastISel way vs. the other bug-free way)
also must not be using the FastISel path? This duplication of logic
makes it hard to test this, since we don't have direct control whether
the FastISel path or the other one is used.
Nonetheless, this PR fixes the crash, though I didn't create a test for
it, since I am unsure yet how it should look like. I would like to learn
how the working non-FastISel mechanism works; I tried looking for it,
but didn't yet manage to find anything
[AMDGPU] Handle lowering addrspace casts from LDS to FLAT address in amdgpu-sw-lower-lds. (llvm#121214)
"infer-address-spaces" pass replaces all refinable generic pointers with
equivalent specific pointers.
At -O0 optimisation level, infer-address-spaces pass doesn't run in the
pipeline.
"amdgpu-sw-lower-lds" pass instruments memory operations on addrspace(3)
ptrs. Since, extra addrspacecasts are present from lds to flat
addrspaces at -O0 and the actual store/load memory instructions are now
on flat addrspace, these addrspacecast need to be handled in the
amdgpu-sw-lower-lds pass itself. This patch lowers the lds ptr first to
the corresponding ptr in the global memory from the asan_malloc. Then
replaces the original cast with addrspacecast from global ptr to flat
ptr.
AMDGPU: Teach isOperandLegal about SALU literal restrictions (llvm#127626)
isOperandLegal mostly implemented the VALU operand rules, and
largely ignored SALU restrictions. This theoretically avoids
folding literals into SALU insts which already have a literal
operand. This issue is currently avoided due to a bug in
SIFoldOperands; this change will allow using raw operand
legality rules.
This breaks the formation of s_fmaak_f32 in SIFoldOperands,
but it probably should not have been forming there in the first
place. TwoAddressInsts or RA should generally handle that,
and this only worked by accident.
[Sparc] Use MCRegister. NFC
[NFC][AMDGPU] Pre-commit a test case of checking register coalescer on
v_pk_mov_b32
(llvm#127715)This PR serves as a preliminary step, adding a test case for register coalescer on v_pk_mov_b32. It is intended to demonstrate the code changes introduced in an upcoming PR.
[Driver][MSVC] Pass profile file to lld-link via -lto-sample-profile option (llvm#127442)
In SPGO lto mode, linker needs -lto-sample-profile option to set sample
profile file.
Linux adds this option by transferring fprofile-sample-use to
-plugin-opt=sample-profile=, which is alias of lto-sample-profile. (in
clang\lib\Driver\ToolChains\CommonArgs.cpp: tools::addLTOOptions()).
But clang on Windows misses the transferring. So add it now.
[mlir] Fix FunctionOpInterface impl for external func (llvm#124693)
For function declarations (i.e. func op has no entry block), the
FunctionOpInterface method
insertArgument
anderaseArgument
willcause segfault. This PR guards against manipulation of empty entry block
by checking whether func op is external.
An example can be seen in google/heir#1324
The segfault trace
[ELF] Improve INPUT_SECTION_FLAGS test
[AMDGPU] Add
isAsCheapAsAMove
forv_pk_mov_b32
(llvm#127632)Co-authored-by: Matt Arsenault [email protected]
[RISCV][VLOPT] Add vfsqrt/vfrsqrt7 instruction to isSupportInstr (llvm#127462)
[clang] fix use after free in clang/tools/c-index-test/c-index-test.c (llvm#127063)
recent change e76739e has exposed
use after free in GetCursorSource() function that returned pointer to
a disposed CXString
[mlir:python] Improve
mlir_(attribute|type|value)_subclass
fornanobind
sstubgen
(llvm#127584)This PR makes several improvements to the stubs that are created by
mlir_(attribute|type|value)_subclass
.First, the PR sets the
__module__
attribute of the classes generatedby the nanobind adaptors for attributes, types, and values (via
mlir_(attribute|type|value)_subclass
). By default, the__module__
property is set to
importlib._bootstrap
, which isn't where we want thenew class to live. The new logic sets the property to the name of the
module provided as
scope
instead. This also makes nanobind'sstubgen
generate stubs for those classes properly, which ignores classes whose
__module__
does not correspond to the module it is generating stubsfor. This resolves llvm#127518.
Second, the PR overwrites the function signatures generated by
stubgen
to a format that uses the desired type names (e.g.,
mlir.ir.Attribute
instead of
MlirAttribute
).Finally, the PR piggy-backs some minor doc and style improvements to
PythonAdaptors.h
.Signed-off-by: Ingo Müller [email protected]
[c-index-test] Fix warnings
This patch fixes:
clang/tools/c-index-test/c-index-test.c:1240:15: error: mixing
declarations and code is a C99 extension
[-Werror,-Wdeclaration-after-statement]
clang/tools/c-index-test/c-index-test.c:1367:14: error: mixing
declarations and code is a C99 extension
[-Werror,-Wdeclaration-after-statement]
clang/tools/c-index-test/c-index-test.c:1468:14: error: mixing
declarations and code is a C99 extension
[-Werror,-Wdeclaration-after-statement]
[MLIR][emitc]: Remove unused functions (NFC)
[libc] Add strftime_l (llvm#127708)
This is a (no-op) locale version of strftime.
[AA] Consider extractvalue and extractelement as escape sources (llvm#127640)
CaptureTracking considers insertions into aggregates and vectors as
captures. As such, extractions from aggregates and vectors are escape
sources. A non-escaping identified local cannot alias with the result of
an extractvalue/extractelement.
Fixes llvm#126670.
Revert "[libc] Add strftime_l" (llvm#127766)
Reverts llvm#127708
[RISCVISel] Compute leading zeros for RISCVISD::VCPOP_VL node (llvm#127705)
This patch adds handling of the RISCVISD::VCPOP_VL node in
RISCVTargetLowering::computeKnownBitsForTargetNode. It eliminates
redundant zero-extension instructions.
[Flang][OpenMP] Skip threadprivate HostAssoc symbols for default privatization (llvm#127754)
Issue: Compilation abnormally terminates in parallel default(private)
Documentation reference:
A threadprivate variable must not appear as the base variable of a list
item in any clause except for the copyin and copyprivate clauses
Explanation:
From the reference, the threadprivate symbols cannot be used in the DSA
clauses, which in turn means, the symbol can be skipped for default
privatization
Fixes llvm#123535
[AMDGPU] Replace gfx940 and gfx941 with gfx942 in offload and libclc (llvm#125826)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.
For SWDEV-512631 and SWDEV-512633
[AMDGPU] Add missing gfx architectures to AddFlangOffloadRuntime.cmake (llvm#125827)
[mlir][bufferization] Module bufferization: Delete obsolete code (llvm#127455)
Delete
equivalenceAnalysis
, which has been incorporated into thegetAliasingValues
API. Also add an additional test case to ensure thatequivalence is properly propagated across function boundaries.
[AMDGPU][MLIR] Replace gfx940 and gfx941 with gfx942 in MLIR (llvm#125836)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.
For SWDEV-512631
[X86] matchUnaryShuffle - add support for matching 512-bit extension patterns. (llvm#127643)
Handles both BWI and non-BWI cases (skips PMOV*XBW without BWI).
The vector-interleaved-store-i8-stride-8.ll VPTERNLOG diffs are due to
better value tracking now recognizing the zero-extension patterns where
before it was any-extension
[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (llvm#126762)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.
This PR removes all occurrences of gfx940/gfx941 from clang that can be
removed without changes in the llvm directory. The
target-invalid-cpu-note/amdgcn.c test is not included here since it
tests a list of targets that is defined in
llvm/lib/TargetParser/TargetParser.cpp.
For SWDEV-512631
[BitcodeReader] Use
poison
instead ofundef
to represent unsupported constexprs in metadata (llvm#127665)Metadata that references unsupported constant expressions can be
represented with
poison
metadata instead ofundef
metadata.[CodeGen] Use __extendhfsf2 and __truncsfhf2 by default (llvm#126880)
The standard libcalls for half to float and float to half conversion are
__extendhfsf2 and __truncsfhf2. However, LLVM currently uses
__gnu_h2f_ieee and __gnu_f2h_ieee instead. As far as I can tell, these
libcalls are an ARM-ism and only provided by libgcc on that platform.
compiler-rt always provides both libcalls.
Use the standard libcalls by default, and only use the __gnu libcalls on
ARM.
[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (llvm#126763)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.
This PR removes all non-documentation occurrences of gfx940/gfx941 from
the llvm directory, and the remaining occurrences in clang.
Documentation changes will follow.
For SWDEV-512631
[clang][bytecode] Fix three-way unordered non-pointer comparisions (llvm#127759)
This can happen with non-pointers, but we shouldn't diagnose it in
that case.
[AMDGPU] Remove FeatureForceStoreSC0SC1 (llvm#126878)
This was only used for gfx940 and gfx941, which have since been removed.
For SWDEV-512631
[AMDGPU][docs] Replace gfx940 and gfx941 with gfx942 in llvm/docs (llvm#126887)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.
This PR removes all documentation occurrences of gfx940/gfx941 except
for the gfx940 ISA description, which will be the subject of a separate
PR.
For SWDEV-512631
[X86] getFauxShuffleMask - add support for vXi64/vXf64 concat_vectors decoding (llvm#127630)
Similar to insert_subvector - limit this to vXi64 vector cases to make the most of cross lane shuffles (for now).
[AMDGPU][docs][NFC] Replace gfx940 with gfx942 in the gfx940 ISA doc (llvm#126906)
gfx940 and gfx941 are no longer supported. This is the last one of a
series of PRs to remove them from the code base.
The ISA documentation still contains a lot of links and file names with
the "gfx940" identifier. Changing them to "gfx942" is probably not worth
the cost of breaking all URLs to these pages that users might have saved
in the past.
For SWDEV-512631
[Flang] Check if two ArrayConstructor's are Equal (llvm#121181)
This also includes comparing the two ImpliedDo
Details
elements and type
upper, stride and values
Fixes: llvm#104526
[clang] Lower non-builtin sincos[f|l] calls to llvm.sincos.* when -fno-math-errno is set (llvm#121763)
This will allow vectorizing these calls (after a few more patches). This
should not change the codegen for targets that enable the use of AA
during the codegen (in
TargetSubtargetInfo::useAA()
). This includestargets such as AArch64. This notably does not include x86 but can be
worked around by passing
-mllvm -combiner-global-alias-analysis=true
to clang.
Follow up to llvm#114086.
[libclc] Improve nextafter behaviour around zero (llvm#127469)
This commit improves the behaviour of (_clc)nextafter around zero.
Specifically, the nextafter value of very small negative numbers in the
positive direction is now negative zero. Previously we'd return positive
zero.
This behaviour is not required as far as OpenCL is concerned: at least,
the CTS isn't testing for it. However, this change does bring our
implementation into bit-equivalence with (libstdc++'s implementation of)
std::nextafter, tested on all possible values of 32-bit float towards
both positive and negative INFINITY.
Furthermore, since the implementation of libclc's floating-point 'rtp'
and 'rtz' conversions use __clc_nextafter, the previous behaviour was
resulting in CTS validation issues. For example, when converting float
-0x1.000002p-25 to half, rounding towards zero or positive infinity,
nextafter was returning +0.0, whereas the correct conversion requires us
to return -0.0.
We could work around this issue in the conversion functions, but since
the change to nextafter is small enough and the behaviour around zero
matches libstdc++, the fix feels at home there.
This commit also converts several variables to unsigned types to avoid
undefined behaviour surrounding signed underflow on the subtractions.
It also converts some variables to be kept in floating-point types, using
fabs to get the absolute value rather than by bit-hacking.
[MLIR][Affine] Fix affine.parallel op verifier (llvm#127611)
Fix affine.parallel op verifier for missing check on zero result lower
or upper bound maps. lb/ub maps should have at least one result.
Fixes: llvm#120186
[flang][OpenMP]Improve support for DECLARE REDUCTION (llvm#127088)
Part of the DECLARE REDUCTION was already supported by the parser, but
the semantics to add the reduction identifier wasn't implemented.
The semantics would not accept the name given by the reduction, so a few
lines added to support that.
Some tests were in place but not quite working, so fixed those up too.
Adding new tests for unparsing and parse-tree, as well as checking the
symbolic name being generated.
Lowering of DECLARE REDUCTION is not supported in this patch, and a test
that it hits the relevant TODO is in this patch (most of this was
already existing, but not actually testing the TODO message).
[AArch64] Add optional extensions enabled on Grace (llvm#127620)
Enable optional ISA extensions on Grace when mcpu=grace
is used: sve2-sm4, sve2-aes, sve2-sha3.
Grace is no longer an alias, but a separate CPU definition.
[lldb] Add LineTable::{upper,lower}_bound (llvm#127519)
The motivation is llvm#123622 and the fact that is hard to fine the last
line entry in a given range.
FindLineEntryByAddress(range_end-1)
isthe best we have, but it's not ideal because it has a magic -1 and that
it relies on there existing a line entry at that address (generally, it
should be there, but if for some case it isn't, we might end up ignoring
the entries that are there (or -- like my incorrect fix in llvm#123622 did
-- iterating through the entire line table).
What we really want is to get the last entry that exists in the given
range. Or, equivalently (and more STL-like) the first entry after that
range. This is what these functions do. I've used the STL names since
they do pretty much exactly what the standard functions do (the main
head-scratcher comes from the fact that our entries represent ranges
rather than single values).
The functions can also be used to simplify the maze of
if
statementsin
FindLineEntryByAddress
, but I'm keeping that as a separate patch.For now, I'm just adding some unit testing for that function to gain
more confidence that the patch does not change the function behavior.
Co-authored-by: Jonas Devlieghere [email protected]
[NFC][MLIR] Make file-local cl::opt global variables static (llvm#126714)
This is per style-guide: make file-scope symbol static whenever possible.
Fix llvm#125983.
AMDGPU: Avoid double attribute lookup for register count attributes (llvm#127782)
[libclc] Clean up directory search procedure (llvm#127783)
During a recent change, the build system accidentally dropped the
(theoretical) support for the CLC builtins library to build
target-specific builtins from the 'amdgpu' directory, due to a change in
variable names. This functionality wasn't being used but was spotted
during another code review.
This commit takes the opportunity to clean up and better document the
code that manages the list of directories to search for builtin
implementations.
While fixing this, some references to now-removed SOURCES files were
discovered which have been cleaned up.
[NVPTX] Add tcgen05.cp/shift intrinsics (llvm#127669)
This patch adds intrinsics for tcgen05.cp and
tcgen05.shift instructions.
lit tests are added and verified with a
ptxas-12.8 executable.
Docs are updated in the NVPTXUsage.rst file.
Signed-off-by: Durgadoss R [email protected]
[mlir][dlti] Fix query keys preallocation (llvm#127786)
Fixes upfront space allocation after llvm#126716
[libc++] Guard include of <features.h> with __has_include (llvm#127691)
Some configurations define AMDGPU or NVPTX on platforms that
don't provide <features.h>, such as CUDA on Mac.
[clang][bytecode] Fix allocating primitive arrays of unknown bound (llvm#127788)
[libc++] Deprecate and remove member types of
hash
in<variant>
(llvm#127758)These member types were deprecated in C++17 by P0174R2 and removed in
C++20 by P0619R4, but the changes in
<variant>
seem missing.Drive-by: Replace one
_NOEXCEPT
withnoexcept
as thehash
specialization is C++17-and-later only.
[NFC] using isUndef() instead of getOpcode() == ISD::UNDEF (llvm#127713)
[NFC] using isUndef() instead of getOpcode() == ISD::UNDEF
[LAA] Make Ptr argument optional in isNoWrap. (llvm#127410)
Update isNoWrap to make the IR Ptr argument optional. This allows using
isNoWrap when dealing with things like pointer-selects, where a select
is translated to multiple pointer SCEV expressions, but there is no IR
value that can be used. We don't try to retrieve pointer values for the
pointer SCEVs and using info from the IR would not be safe. For example,
we cannot use inbounds, because the pointer may never be accessed.
PR: llvm#127410
[MLIR][Linalg] Introduce Python API for linalg.batch_matmul Ops. (llvm#127614)
As linalg.batch_matmul has been moved into tablegen from OpDSL, its
derived python wrapper no longer exist.This patch adds the required
python wrapper.
Also refactors the BatchmatmulOp printer to make it consistent with its
parser.
Revert "[PowerPC] Deprecate uses of ISD::ADDC/ISD::ADDE/ISD::SUBC/ISD::SUBE (llvm#116984)"
This reverts commit 7763119 (leaving the modifications from 03cb46d)..
[clang] Lower modf builtin using
llvm.modf
intrinsic (llvm#126750)This updates the existing
modf[f|l]
builtin to be lowered via thellvm.modf.*
intrinsic (rather than directly to a library call).flang: Fix build with latest libc++ (llvm#127362)
I think this first stopped working with
9548366. This patch fixes the following
error:
/home/runner/work/llvm-project/llvm-project/flang/runtime/io-api-minimal.cpp:153:11:
error: '__libcpp_verbose_abort' is missing exception specification
'noexcept'
153 | void std::__libcpp_verbose_abort(char const format, ...) {
| ^
| noexcept
/mnt/build/bin/../include/c++/v1/__verbose_abort:30:28: note: previous
declaration is here
30 | printf, 1, 2) void __libcpp_verbose_abort(const char __format,
...) _LIBCPP_VERBOSE_ABORT_NOEXCEPT;
| ^
1 error generated.
[flang] Implement !DIR$ UNROLL_AND_JAM [N] (llvm#125046)
This patch implements support for the UNROLL_AND_JAM directive to enable
or disable unrolling and jamming on a
DO LOOP
.It must be placed immediately before a
DO LOOP
and applies only to theloop that follows. N is an integer that specifying the unrolling factor.
This is done by adding an attribute to the branch into the loop in LLVM
to indicate that the loop should unrolled and jammed.
[MLIR][NFC] Use base alias for constructor inheritance (llvm#127756)
During my previous cleanup (llvm#127403), I did not notice that we defined a
type alias for the base class. This type alias allows us to use the
shorter form Base::Base, and this PR switches to that.
[libclc][NFC] Reformat ep_log.cl
[clang][Sema] Fix initialization of
NonTypeTemplateParmDecl
... (llvm#121768)...when there are invalid constraints.
When attaching a
TypeConstraint
, in case of error, the trailingpointer that is supposed to point to the constraint is left
uninitialized.
Sometimes the uninitialized value will be a
nullptr
, but at othertimes it will not. If we traverse the AST (for instance, dumping it, or
when writing the BMI), we may get a crash depending on the value that
was left. The serialization may also contain a bogus value.
In this commit, we always initialize the
PlaceholderTypeConstraint
with
nullptr
, to avoid accessing this uninitialized memory.This does not affect only modules, but it causes a segfault more
consistently when they are involved.
The test case was reduced from
mp-units
.Co-authored-by: Erich Keane [email protected]
[llvm-dwarfdump] Print number of out-of-line functions described by DWARF (llvm#127233)
Some of the functions in
#functions
may have several inlinedinstances, but also an out-of-line definition.
Therefore, for complex enough DWARF input,
#functions
-#inlined functions
would not give us the number of out-of-line functiondefinitions.
llvm-dwarfdump
, however, already keeps track of those; print it aspart of the statistics, as this number is useful in certain scenarios.
[libc] add
LLVM_LIBC_CAST
macro. (llvm#127319)related: llvm#127238
This patch adds a macro called
LLVM_LIBC_CAST
, similar to__BIONIC_CAST
, for type conversion inendian.h
.[Clang] Do not try to transform invalid bindings (llvm#125658)
In the presence of an invalid structured binding decomposition, some
binding packs may be invalid and trying to transform them would produce
a recovery expression that does not contains a pack, leading to
assertions in places where we would expect a pack at that stage.
Fixes llvm#125165
[Clang] Add release note for llvm#127623 (llvm#127815)
While reviewing llvm#127623, I missed that it didn’t have a release note.
[libc++] Avoid code duplication in strings operator+ overloads (llvm#126048)
[Analysis] Avoid repeated hash lookups (NFC) (llvm#127743)
[AsmPrinter] Avoid repeated hash lookups (NFC) (llvm#127744)
[CodeGen] Avoid repeated hash lookups (NFC) (llvm#127745)
[Object] Avoid repeated hash lookups (NFC) (llvm#127746)
[Support] Avoid repeated hash lookups (NFC) (llvm#127747)
[X86] Avoid repeated hash lookups (NFC) (llvm#127748)
[libclc] Explicitly qualify private address spaces (llvm#127823)
Doing so provides stability when compiling the builtins in a mode in
which unqualified pointers may be interpreted as being in the generic
address space, such as in OpenCL 3.0.
We eventually want to provide 'generic' overloads of the builtins in
libclc so this prepares the ground a little better.
It could be argued that having the internal CLC helper functions be
unqualified is more flexible, in case it's better for a target to have
the pointers in the generic address space. This commits to the private
address space for more stability across different OpenCL environments.
[lldb] Make GetOutputStreamSP and GetErrorStreamSP protected (llvm#127682)
This makes GetOutputStreamSP and GetErrorStreamSP protected members of
Debugger. Users who want to print to the debugger's stream should use
GetAsyncOutputStreamSP and GetAsyncErrorStreamSP instead and the few
remaining stragglers have been migrated.
Revert "Reapply [CaptureTracking][FunctionAttrs] Add support for CaptureInfo (llvm#125880)"
This reverts commit 0fab404.
Seems to break LTO builds of clang on Windows, see comments on
llvm#125880
[lldb] Gardening in StreamAsynchronousIO (NFC) (llvm#127717)
A handful of minor improvements to StreamAsynchronousIO:
[libc++] Optimize ranges::move{,_backward} for vector::iterator (llvm#121109)
As a follow-up to llvm#121013 (which optimized
ranges::copy
) and llvm#121026(which optimized
ranges::copy_backward
), this PR enhances theperformance of
std::ranges::{move, move_backward}
forvector<bool>::iterator
, addressing a subtask outlined in issue llvm#64038.The optimizations bring performance improvements analogous to those
achieved for the
{copy, copy_backward}
algorithms: up to 2000x foraligned moves and 60x for unaligned moves. Moreover, comprehensive
tests covering up to 4 storage words (256 bytes) with odd and even bit
sizes are provided, which validate the proposed optimizations in this
patch.
[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (llvm#127240)
Previous PR llvm#122950 get
reverted since it hit the buildbot failure. Another patch get merged
when this PR is under review, and thus causing one test not up to date.
repen this PR and fixed the issue.
[flang] AliasAnalysis: Handle fir.load on hlfir.designate (llvm#127107)
For example, determine that the address in
obj%p
below cannot aliasthe address of
v
:[clang] handle fp options in __builtin_convertvector (llvm#125522)
This patch allows using fpfeatures pragmas with __builtin_convertvector:
it to ConvertVectorExpr
fpfeatures contained in ConvertVectorExpr
[AMDGPU] Remove unused variables. NFC
[CIR] Add additional frontend actions (llvm#127249)
Add frontend actions to support emitting assembly, bitcode, and object
files when compiling with ClangIR. This change also correctly sets and
propagates the target triple in the MLIR and LLVM modules, which was a
necessary prerequisite for emitting assembly and object files.
[InstCombine] avoid extra instructions in foldSelectICmpAnd (llvm#127398)
Disable fold when it will result in more instructions.
[ELF,test] Remove unneeded -o /dev/null
When the script has executed
cd %t
, it is fine to to use the outputfile
a.out
.(We don't want to rely on lit's default PWD to support lit compatible
runners. Therefore -o /dev/null is used when PWD has not been changed
to a %t derived path.)
Remove header file spuriously added by 9905728.
[InstCombine] handle trunc to i1 in foldSelectICmpAndBinOp (llvm#127390)
for
trunc nuw
saves a instruction and otherwise only otherinstructions without the select, same behavior as for bit test before.
proof: https://alive2.llvm.org/ce/z/a6QmyV
[ELF,test] Clean up aarch64-relocs.s
[SLP][NFC]Replace undefs by zeroinitializer
[libc++][test] Refactor tests for std::{copy, move, fill} algorithms (llvm#120909)
This refactor includes the following changes:
types::for_each
to remove redundant code;type_algorithms.h
in some test files;(e.g.,
test5()
,test6()
) but belong to the same kind are now groupedinto one function (
test_struct_array()
).Include test folder in the Clang Static Analyzer team mentions (llvm#127810)
See
https://discourse.llvm.org/t/taking-ownership-of-clang-test-analysis/84689
[VPlan] Remove dead exit block handling code in HCFGBuilder.
The mapping of IR ExitBB to a VPBB isn't used. It also sets an incorrect
VPBB for the ExitBB; the regions successor is the middle block, no the
exit block.
It also unnecessarily triggers an assertion after 38376de.
[X86] combineX86ShuffleChain - provide list of combined shuffle nodes, replace HasVariableMask bool arg. NFC. (llvm#127826)
Minor NFC refactor before making better variable mask combining decisions - isTargetShuffleVariableMask doesn't discriminate between fast (AND, PSHUFB etc.) and slow (VPERMV3 etc.) variable shuffles, so an opaque HasVariableMask is only of limited use.
[clangd] Avoid round-trip from SourceLocation to clangd::Range and back in SymbolCollector::handleMacros() (llvm#127757)
[mlir][tosa] Align check variables naming to use capitals (llvm#127830)
Move in using capital letters for variable capture in LIT
Signed-off-by: Georgios Pinitas [email protected]
[Clang] [Sema] Combine fallout warnings to just one warning (llvm#127546)
This merges several falloff and noreturn-related warnings and
removes unused diagnostic arguments.
Changes:
warn_maybe_falloff_nonvoid_function
andwarn_falloff_nonvoid_function
,warn_maybe_falloff_nonvoid_coroutine
and
warn_falloff_nonvoid_coroutine
,warn_maybe_falloff_nonvoid_lambda
andwarn_falloff_nonvoid_lambda
were combined into
warn_falloff_nonvoid
,err_maybe_falloff_nonvoid_block
anderr_falloff_nonvoid_block
werecombined into
err_falloff_nonvoid
err_noreturn_block_has_return_expr
anderr_noreturn_lambda_has_return_expr
were merged intoerr_noreturn_has_return_expr
with the same semantics aswarn_falloff_nonvoid
orerr_falloff_nonvoid
.Removed some diagnostic args that weren’t being used by the diagnostics.
[GlobalOpt] Don't query TTI on a llvm.memcpy declaration. (llvm#127760)
Querying TTI creates a Subtarget object, but an llvm.memcpy declaration
doesn't have target-cpu and target-feature attributes like functions
with definitions. This can cause a warning to be printed on RISC-V
because the target-abi in the Module requires floating point, but the
subtarget features don't enable floating point. So far we've only seen
this in LTO when an -mcpu is not supplied for the TargetMachine.
To fix this, get TTI for the calling function instead.
Fixes the issue reported here
llvm#69780 (comment)
[MC] Remove MCRegister::isStackSlot. (llvm#127755)
Stack slots should only be stored in Register. The only caller was
Register::isStackSlot so just inline it there.
[mlir][tosa] Update SelectOp's input names to match TOSA specification (llvm#127833)
Updated:
Signed-off-by: Jerry Ge [email protected]
[clangd] Update XRefs to support overridden ObjC methods (llvm#127109)
[MLIR] Fix doc build (NFC)
AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (llvm#127711)
For targets that support IEEE fminimum_num/fmaximum_num, the
corresponding _min_num_fXY/_max_num_fXY instructions themselves
already did the canonicalization for the inputs. As a result, we do not
need to explicitly canonicalize the inputs for fminnum/fmaxnum.
[Modules] Add
clang/Lex/HLSLRootSignatureTokenKinds.def
to clang'smodulemap
(llvm#127839)b41b86a added a new textual header
clang/Lex/HLSLRootSignatureTokenKinds.def
but did not add it toclang
's module map. This causes build failure when building llvm with-DLLVM_ENABLE_MODULES=ON
. This PR adds the new textual header to themodule map and fixes the build break.
Fixing rdar://145148093.
[HLSL] Implement the 'and' HLSL function (llvm#127098)
Addresses llvm#125604
and
as an HLSL builtin functionand
HLSL builtin function gets lowered to the the LLVMand
instruction
[MLIR][LLVMIR] Add support for asin acos atan intrinsics op (llvm#127317)
This is similar to llvm#125748
[RISCV] Add a pass to remove ADDI by reassociating to fold into load/store address. (llvm#127151)
SelectionDAG will not reassociate adds to the end of a chain if
there are multiple users of later additions. This prevents isel
from folding the immediate into a load/store address.
One easy way to see this is accessing an array in a struct with
two different indices. An ADDI will be used to get to the start
of the array then 2 different SHXADD instructions will be used to
add the scaled indices. Finally the SHXADD will be used by different
load instructions. We can remove the ADDI by folding the offset into
each load.
This patch adds a new pass that analyzes how an ADDI constant
propagates through address arithmetic. If the arithmetic is only
used by a load/store and the offset is small enough, we can adjust
the load/store offset and remove the ADDI.
This pass is placed before MachineCSE to allow cleanups if some
instructions become common after removing offsets from their inputs.
This pass gives ~3% improvement on dynamic instruction count on
541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's
a ~1% improvement on 557.xz_r.
[OpenMP][NFC] Remove unused clock function types and globals (llvm#127684)
[OpenMP][NFC] Remove unused __kmp_dispatch_lock global (llvm#127686)
[clang][perf-training] Support excluding LLVM build from PGO training (llvm#126876)
Using LLVM build itself for PGO training is convenient and a great
starting point but it also has several issues:
compiler and if those tools aren't available in PATH, the build will
fail.
always be available in the default location requiring an explicit
sysroot.
exercises the -O3 pipeline and can pesimize other configurations.
linker (beyond what the CMake itself does).
Rather than using LLVM build, ideally we would provide a more minimal,
purpose built corpus. While we're working on building such a corpus,
provide a CMake option that lets vendors disable the use LLVM build for
PGO training.
[MLIR] Fix rewrite of ops with vector operands to LLVM on GPU (llvm#127844)
There was a discrepancy between the type-converter and rewrite-pattern
parts of conversion to LLVM used in various GPU targets, at least ROCDL
and NVVM:
converting them to nests of
!llvm.array< ... >
with a vector at theinner-most dimension:
https://github.com/llvm/llvm-project/blob/8337d01e3058e7f47675f5b2b908b4e7821895d7/mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp#L629-L655
llvm.array
:https://github.com/llvm/llvm-project/blob/8337d01e3058e7f47675f5b2b908b4e7821895d7/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp#L594-L596
That led to conversion failures when lowering
math
dialect ops onrank-2 vectors, as in the testcase being added in this PR.
This PR fixes this by reusing a shared utility already used in other
conversions to LLVM:
https://github.com/llvm/llvm-project/blob/8337d01e3058e7f47675f5b2b908b4e7821895d7/mlir/lib/Conversion/LLVMCommon/VectorPattern.cpp#L80-L104
Signed-off-by: Benoit Jacob [email protected]
[SandboxIR] OpaqueValue (llvm#127699)
This patch implements a new subclass of the Value class used for Sandbox
IR Values that we don't support, like metadata or inline asm. The goal
is to never have null sandboxir::Value objects, because this is not the
expected behavior.
Revert "[RISCV] Add a pass to remove ADDI by reassociating to fold into load/store address. (llvm#127151)"
This reverts commit c3ebbfd.
Seeing some test failures on the build bot.
[mlir][Linalg] Fix linalg.generic iteration domain collapse for dynamic dims (llvm#118208)
This pr fixes how iteration domain of linalg.generic is collapsed when
fusing with tensor.expand_shape. Previously, the output_shape for
tensor.expand shape was infered, which doesn't always work except some
special cases.
This patch makes the logic explicitly set the bounds of the new
collapsed iteration domain, because we already know them.
Co-authored-by: Jakub Kuderski [email protected]
Recommit "[RISCV] Add a pass to remove ADDI by reassociating to fold into load/store address. (llvm#127151)"
Tests have been re-generated with recent scheduler changes.
Original message:
SelectionDAG will not reassociate adds to the end of a chain if
there are multiple users of later additions. This prevents isel
from folding the immediate into a load/store address.
One easy way to see this is accessing an array in a struct with
two different indices. An ADDI will be used to get to the start
of the array then 2 different SHXADD instructions will be used to
add the scaled indices. Finally the SHXADD will be used by different
load instructions. We can remove the ADDI by folding the offset into
each load.
This patch adds a new pass that analyzes how an ADDI constant
propagates through address arithmetic. If the arithmetic is only
used by a load/store and the offset is small enough, we can adjust
the load/store offset and remove the ADDI.
This pass is placed before MachineCSE to allow cleanups if some
instructions become common after removing offsets from their inputs.
This pass gives ~3% improvement on dynamic instruction count on
541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's
a ~1% improvement on 557.xz_r.
[gn build] Port 26e3750
[DSE] Make iter order deterministic in removePartiallyOverlappedStores. NFC (llvm#127678)
In removePartiallyOverlappedStores we iterate over
InstOverlapIntervalsTy which is a DenseMap. Change that map into using
MapVector to ensure that we apply the transforms in a deterministic
order. I've only seen that the order matters if starting to use names
for the instructions created when doing the transforms. But such things
are a bit annoying when debugging etc.
[GlobalOpt] Remove Function* argument from tryWidenGlobalArrayAndDests. NFC (llvm#127848)
This is only used to get the Module and the LLVMContext. We can get both
of those from the GlobalVariable*.
[mlir][Vector] Deprecate vector.extractelement/vector.insertelement (llvm#113829)
See
https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops/71116/6
for more information.
[MemProf] Support cloning through recursive cycles (llvm#127429)
In order to facilitate cloning of recursive cycles, we first identify
backedges using a standard DFS search from the root callers, then
initially defer recursively invoking the cloning function via those
edges. This is because the cloning opportunity along the backedge may
not be exposed until the current node is cloned for other non-backedge
callers that are cold after the earlier recursive cloning, resulting
in a cold predecessor of the backedge. So we recursively invoke the
cloning function for the backedges during the cloning of the current
node for its caller edges (which were sorted to enable handling cold
callers first).
There was no significant time or memory overhead measured for several
large applications.
[InstCombine] Test for regession with trunc in foldSelectICmpAnd
[SandboxIR][NFC] Change order of ifs in Context::getOrCreateValueInternal() (llvm#127891)
Move the most common if statement to the top and the least common ones
to the bottom. This should save CPU cycles during compilation.
This patch also prefixes the llvm variables with the LLVM prefix to make
the naming convention in this function more uniform. For example
C
toLLVMC
.Check the type of Objective-C++ instance variables in WebKit member variable checkers. (llvm#127570)
Like a C++ member variable, every Objective-C++ instance variable must
be a RefPtr, Ref CheckedPtr, or CheckedRef to an object, not a raw
pointer or reference.
adding rootsignature to obj2yaml
adding test
removing old test
remove useless includes
addressing comments
updating test
removing useless header
fix formating
renaming test
addressing pr comments
adding str to ROOT_ELEMENT_FLAG
formating
refactoring to follow llvm standards
addressing comments
clean up
remove version
fix pr
adding dxil-dis test
adding compatibility test
addressing test concerns
clean up
addressing comments
adding fail test
adding comment
adding few more tests
format
cleanup
adding metadata extraction
moving root signature to it's own pass
formating
removing useless imports
fixing pr changes
adding some asserts
format
fixing assert
cleaning
clean up
addressing comments
removing version
fix test
addressing PR Comments
fix test
filtering root signatures not associated with entry function
separating parsing and validation
improve error handling
clean up
clean up
formating
addressing comments and fix tests
formating
addressing pr comments
addressing PR comments
addressing pr comments
removing copies from root signature use in dx container globals
adding more tests
maybe fix test?
try fix format
removing test
adding llvm unreachable and testing test
stopping compilation if root signature error were emitted
making sure Error tests fail
adding root constants
refactoring root signature analysis to return a map instead
addressing pr comments
clean up
addressing pr comments
implementing find interface for RootSignatureAnalysisWrapper
adding test for null function
fix root signature test error
fix other functions are checked
adding missing continue
adding few more tests
adding yaml2obj support
adding support for obj2yaml and initial tests
multiple parameters support and more testing
clean up
fixing formating
reapply rebase fix
clean up
addressing pr comments
first working version
formating
moving the offset logic to it's own class
refactoring to remove use of map and string
addressing comments
using buffer_ostream
remove getsize
clean up
clean up
clean up
clean up
addressing pr comments
clean up
Merge branch 'refactor/improve-offset-calculation' into users/joaosaffran/127840
clean up
moving initializer arround
addressing pr comments