[libc++] Set feature-test macro `__cpp_lib_atomic_float` (#127559) #1

joaosaffran · 2025-02-26T22:34:28Z

[libc++] Set feature-test macro __cpp_lib_atomic_float (llvm#127559)

The corresponding feature was implemented in LLVM 18 (by llvm#67799), but
this FTM wasn't added before.

[InstCombine] Propagate flags in foldSelectICmpAndBinOp (llvm#127437)

It is always safe to add poison-generating flags for BinOp Y, Identity.
Proof: https://alive2.llvm.org/ce/z/8BLEpq
and https://alive2.llvm.org/ce/z/584Bb4

Then we can propagate flags from one of the arms:

select Cond, Y, (BinOp flags Y, Z) ->
select Cond, (BinOp flags Y, Identity), (BinOp flags Y, Z) ->
BinOp flags Y, (select Cond, Identity, Z)

This patch is proposed to avoid information loss caused by
llvm#127390.

[PGO][test] Specify exact match for check labels (llvm#117376)

Some check labels in the basic.ll test are simply f[1-3], which matches
all cases where the function entry is hot, cold, or uncategorized. Since
the actual test results for each label have mixed cases, the current
labels can be considered ambiguous.

This patch specifies exact matches for each label to ensure more precise
validation and to prevent potential regressions in the future.

[lld][WebAssembly] Fix warnings in test. NFC (llvm#127714)

[clang][X86] Only define CET macro for X86 targets (llvm#127616)

The -fcf-protection flag is now also used to enable CFI features for
the RISC-V target, so it's not suitable to define __CET__ solely based
on the flag anymore. This patch moves the definition of the __CET__
macro into X86 target hook, so only X86 targets with the
-fcf-protection flag would enable the __CET__ macro.

See llvm#109784 and
llvm#112477 for the adoption
of -fcf-protection flag for RISC-V targets.

[HLSL] Allow arrays to copy-initialize (llvm#127557)

This change allows array variables to copy-initialize from other arrays.
It also corrects a small error in HLSL C-Style casting that did not
error on casting to arrays if elementwise and splat conversions fail.

Fixes llvm#127551

[X86] Add missing vNbf16 handling in X86CallingConv.td file (llvm#127102)

Lack of these entries caused clang to crash on the following code:

__m256bh fun(__m256bh arg) {
	return arg;
	
}
__m256bh run() {
    __m256bh arg= {0};
    fun(arg);
}

It caused the FastISel to fail since it handled the call lowering basing
on the X86CallingConv table.

Curiously, if FastISel fails somewhere down the line and
selectionDAGISel fallbacks, the crash does not occur. Following code
does not crash:

__m256bh fun(__m256bh arg) {
	return arg;
}

__m256bh run() {
    __m256bh arg= {0};
    return fun(arg);

}

This is puzzling to me. Obviously, if FastISel fails then compiler
fallbacks to something else to lower these calls -- but since the
X86callingConv table doesn't have entries for vNbf16 how does this
other thing manage not to crash? It has to use some other mechanism, one
which doesn't use the table. This rises following questions:

how is this lowering accomplished without, presumably, using the
CallingConv entries?
why is the table not used? I mean this points to some logic
duplication (fastISel way vs. the other bug-free way)
How to properly test this? There is a test for vNbf16 values, but it
also must not be using the FastISel path? This duplication of logic
makes it hard to test this, since we don't have direct control whether
the FastISel path or the other one is used.

Nonetheless, this PR fixes the crash, though I didn't create a test for
it, since I am unsure yet how it should look like. I would like to learn
how the working non-FastISel mechanism works; I tried looking for it,
but didn't yet manage to find anything

[AMDGPU] Handle lowering addrspace casts from LDS to FLAT address in amdgpu-sw-lower-lds. (llvm#121214)

"infer-address-spaces" pass replaces all refinable generic pointers with
equivalent specific pointers.

At -O0 optimisation level, infer-address-spaces pass doesn't run in the
pipeline.

"amdgpu-sw-lower-lds" pass instruments memory operations on addrspace(3)
ptrs. Since, extra addrspacecasts are present from lds to flat
addrspaces at -O0 and the actual store/load memory instructions are now
on flat addrspace, these addrspacecast need to be handled in the
amdgpu-sw-lower-lds pass itself. This patch lowers the lds ptr first to
the corresponding ptr in the global memory from the asan_malloc. Then
replaces the original cast with addrspacecast from global ptr to flat
ptr.

AMDGPU: Teach isOperandLegal about SALU literal restrictions (llvm#127626)

isOperandLegal mostly implemented the VALU operand rules, and
largely ignored SALU restrictions. This theoretically avoids
folding literals into SALU insts which already have a literal
operand. This issue is currently avoided due to a bug in
SIFoldOperands; this change will allow using raw operand
legality rules.

This breaks the formation of s_fmaak_f32 in SIFoldOperands,
but it probably should not have been forming there in the first
place. TwoAddressInsts or RA should generally handle that,
and this only worked by accident.

[Sparc] Use MCRegister. NFC

[NFC][AMDGPU] Pre-commit a test case of checking register coalescer on v_pk_mov_b32 (llvm#127715)

This PR serves as a preliminary step, adding a test case for register coalescer on v_pk_mov_b32. It is intended to demonstrate the code changes introduced in an upcoming PR.

[Driver][MSVC] Pass profile file to lld-link via -lto-sample-profile option (llvm#127442)

In SPGO lto mode, linker needs -lto-sample-profile option to set sample
profile file.
Linux adds this option by transferring fprofile-sample-use to
-plugin-opt=sample-profile=, which is alias of lto-sample-profile. (in
clang\lib\Driver\ToolChains\CommonArgs.cpp: tools::addLTOOptions()).
But clang on Windows misses the transferring. So add it now.

[mlir] Fix FunctionOpInterface impl for external func (llvm#124693)

For function declarations (i.e. func op has no entry block), the
FunctionOpInterface method insertArgument and eraseArgument will
cause segfault. This PR guards against manipulation of empty entry block
by checking whether func op is external.

An example can be seen in google/heir#1324

The segfault trace

 #1 0x0000560f1289d9db PrintStackTraceSignalHandler(void*) /proc/self/cwd/external/llvm-project/llvm/lib/Support/Unix/Signals.inc:874:1
 #2 0x0000560f1289b116 llvm::sys::RunSignalHandlers() /proc/self/cwd/external/llvm-project/llvm/lib/Support/Signals.cpp:105:5
 #3 0x0000560f1289e145 SignalHandler(int) /proc/self/cwd/external/llvm-project/llvm/lib/Support/Unix/Signals.inc:415:1
 #4 0x00007f829a3d9520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #5 0x0000560f1257f8bc void __gnu_cxx::new_allocator<mlir::BlockArgument>::construct<mlir::BlockArgument, mlir::BlockArgument>(mlir::BlockArgument*, mlir::BlockArgument&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/ext/new_allocator.h:162:23
 #6 0x0000560f1257f84d void std::allocator_traits<std::allocator<mlir::BlockArgument> >::construct<mlir::BlockArgument, mlir::BlockArgument>(std::allocator<mlir::BlockArgument>&, mlir::BlockArgument*, mlir::BlockArgument&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:520:2
 #7 0x0000560f12580498 void std::vector<mlir::BlockArgument, std::allocator<mlir::BlockArgument> >::_M_insert_aux<mlir::BlockArgument>(__gnu_cxx::__normal_iterator<mlir::BlockArgument*, std::vector<mlir::BlockArgument, std::allocator<mlir::BlockArgument> > >, mlir::BlockArgument&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/vector.tcc:405:7
 #8 0x0000560f1257cf7e std::vector<mlir::BlockArgument, std::allocator<mlir::BlockArgument> >::insert(__gnu_cxx::__normal_iterator<mlir::BlockArgument const*, std::vector<mlir::BlockArgument, std::allocator<mlir::BlockArgument> > >, mlir::BlockArgument const&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/vector.tcc:154:6
 #9 0x0000560f1257b349 mlir::Block::insertArgument(unsigned int, mlir::Type, mlir::Location) /proc/self/cwd/external/llvm-project/mlir/lib/IR/Block.cpp:178:13
#10 0x0000560f123d2a1c mlir::function_interface_impl::insertFunctionArguments(mlir::FunctionOpInterface, llvm::ArrayRef<unsigned int>, mlir::TypeRange, llvm::ArrayRef<mlir::DictionaryAttr>, llvm::ArrayRef<mlir::Location>, unsigned int, mlir::Type) /proc/self/cwd/external/llvm-project/mlir/lib/Interfaces/FunctionInterfaces.cpp:232:11
#11 0x0000560f0be6b727 mlir::detail::FunctionOpInterfaceTrait<mlir::func::FuncOp>::insertArguments(llvm::ArrayRef<unsigned int>, mlir::TypeRange, llvm::ArrayRef<mlir::DictionaryAttr>, llvm::ArrayRef<mlir::Location>) /proc/self/cwd/bazel-out/k8-dbg/bin/external/llvm-project/mlir/include/mlir/Interfaces/FunctionInterfaces.h.inc:809:7
#12 0x0000560f0be6b536 mlir::detail::FunctionOpInterfaceTrait<mlir::func::FuncOp>::insertArgument(unsigned int, mlir::Type, mlir::DictionaryAttr, mlir::Location) /proc/self/cwd/bazel-out/k8-dbg/bin/external/llvm-project/mlir/include/mlir/Interfaces/FunctionInterfaces.h.inc:796:7

[ELF] Improve INPUT_SECTION_FLAGS test

[AMDGPU] Add isAsCheapAsAMove for v_pk_mov_b32 (llvm#127632)

Co-authored-by: Matt Arsenault [email protected]

[RISCV][VLOPT] Add vfsqrt/vfrsqrt7 instruction to isSupportInstr (llvm#127462)

[clang] fix use after free in clang/tools/c-index-test/c-index-test.c (llvm#127063)

recent change e76739e has exposed
use after free in GetCursorSource() function that returned pointer to
a disposed CXString

[mlir:python] Improve mlir_(attribute|type|value)_subclass for nanobinds stubgen (llvm#127584)

This PR makes several improvements to the stubs that are created by
mlir_(attribute|type|value)_subclass.

First, the PR sets the __module__ attribute of the classes generated
by the nanobind adaptors for attributes, types, and values (via
mlir_(attribute|type|value)_subclass). By default, the __module__
property is set to importlib._bootstrap, which isn't where we want the
new class to live. The new logic sets the property to the name of the
module provided as scope instead. This also makes nanobind's stubgen
generate stubs for those classes properly, which ignores classes whose
__module__ does not correspond to the module it is generating stubs
for. This resolves llvm#127518.

Second, the PR overwrites the function signatures generated by stubgen
to a format that uses the desired type names (e.g., mlir.ir.Attribute
instead of MlirAttribute).

Finally, the PR piggy-backs some minor doc and style improvements to
PythonAdaptors.h.

Signed-off-by: Ingo Müller [email protected]

[c-index-test] Fix warnings

This patch fixes:

clang/tools/c-index-test/c-index-test.c:1240:15: error: mixing
declarations and code is a C99 extension
[-Werror,-Wdeclaration-after-statement]

clang/tools/c-index-test/c-index-test.c:1367:14: error: mixing
declarations and code is a C99 extension
[-Werror,-Wdeclaration-after-statement]

clang/tools/c-index-test/c-index-test.c:1468:14: error: mixing
declarations and code is a C99 extension
[-Werror,-Wdeclaration-after-statement]

[MLIR][emitc]: Remove unused functions (NFC)

[libc] Add strftime_l (llvm#127708)

This is a (no-op) locale version of strftime.

[AA] Consider extractvalue and extractelement as escape sources (llvm#127640)

CaptureTracking considers insertions into aggregates and vectors as
captures. As such, extractions from aggregates and vectors are escape
sources. A non-escaping identified local cannot alias with the result of
an extractvalue/extractelement.

Fixes llvm#126670.

Revert "[libc] Add strftime_l" (llvm#127766)

Reverts llvm#127708

[RISCVISel] Compute leading zeros for RISCVISD::VCPOP_VL node (llvm#127705)

This patch adds handling of the RISCVISD::VCPOP_VL node in
RISCVTargetLowering::computeKnownBitsForTargetNode. It eliminates
redundant zero-extension instructions.

[Flang][OpenMP] Skip threadprivate HostAssoc symbols for default privatization (llvm#127754)

Issue: Compilation abnormally terminates in parallel default(private)

Documentation reference:
A threadprivate variable must not appear as the base variable of a list
item in any clause except for the copyin and copyprivate clauses

Explanation:
From the reference, the threadprivate symbols cannot be used in the DSA
clauses, which in turn means, the symbol can be skipped for default
privatization

Fixes llvm#123535

[AMDGPU] Replace gfx940 and gfx941 with gfx942 in offload and libclc (llvm#125826)

gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

For SWDEV-512631 and SWDEV-512633

[AMDGPU] Add missing gfx architectures to AddFlangOffloadRuntime.cmake (llvm#125827)

[mlir][bufferization] Module bufferization: Delete obsolete code (llvm#127455)

Delete equivalenceAnalysis, which has been incorporated into the
getAliasingValues API. Also add an additional test case to ensure that
equivalence is properly propagated across function boundaries.

[AMDGPU][MLIR] Replace gfx940 and gfx941 with gfx942 in MLIR (llvm#125836)

gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

For SWDEV-512631

[X86] matchUnaryShuffle - add support for matching 512-bit extension patterns. (llvm#127643)

Handles both BWI and non-BWI cases (skips PMOV*XBW without BWI).

The vector-interleaved-store-i8-stride-8.ll VPTERNLOG diffs are due to
better value tracking now recognizing the zero-extension patterns where
before it was any-extension

[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (llvm#126762)

gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all occurrences of gfx940/gfx941 from clang that can be
removed without changes in the llvm directory. The
target-invalid-cpu-note/amdgcn.c test is not included here since it
tests a list of targets that is defined in
llvm/lib/TargetParser/TargetParser.cpp.

For SWDEV-512631

[BitcodeReader] Use poison instead of undef to represent unsupported constexprs in metadata (llvm#127665)

Metadata that references unsupported constant expressions can be
represented with poison metadata instead of undef metadata.

[CodeGen] Use __extendhfsf2 and __truncsfhf2 by default (llvm#126880)

The standard libcalls for half to float and float to half conversion are
__extendhfsf2 and __truncsfhf2. However, LLVM currently uses
__gnu_h2f_ieee and __gnu_f2h_ieee instead. As far as I can tell, these
libcalls are an ARM-ism and only provided by libgcc on that platform.
compiler-rt always provides both libcalls.

Use the standard libcalls by default, and only use the __gnu libcalls on
ARM.

[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (llvm#126763)

gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all non-documentation occurrences of gfx940/gfx941 from
the llvm directory, and the remaining occurrences in clang.

Documentation changes will follow.

For SWDEV-512631

[clang][bytecode] Fix three-way unordered non-pointer comparisions (llvm#127759)

This can happen with non-pointers, but we shouldn't diagnose it in
that case.

[AMDGPU] Remove FeatureForceStoreSC0SC1 (llvm#126878)

This was only used for gfx940 and gfx941, which have since been removed.

For SWDEV-512631

[AMDGPU][docs] Replace gfx940 and gfx941 with gfx942 in llvm/docs (llvm#126887)

gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all documentation occurrences of gfx940/gfx941 except
for the gfx940 ISA description, which will be the subject of a separate
PR.

For SWDEV-512631

[X86] getFauxShuffleMask - add support for vXi64/vXf64 concat_vectors decoding (llvm#127630)

Similar to insert_subvector - limit this to vXi64 vector cases to make the most of cross lane shuffles (for now).

[AMDGPU][docs][NFC] Replace gfx940 with gfx942 in the gfx940 ISA doc (llvm#126906)

gfx940 and gfx941 are no longer supported. This is the last one of a
series of PRs to remove them from the code base.

The ISA documentation still contains a lot of links and file names with
the "gfx940" identifier. Changing them to "gfx942" is probably not worth
the cost of breaking all URLs to these pages that users might have saved
in the past.

For SWDEV-512631

[Flang] Check if two ArrayConstructor's are Equal (llvm#121181)

This also includes comparing the two ImpliedDo

Details

For ArrayConstructor, check if x and y have the same
elements and type
For ImpliedDo, check if x and y have the same lower,
upper, stride and values

Fixes: llvm#104526

[clang] Lower non-builtin sincos[f|l] calls to llvm.sincos.* when -fno-math-errno is set (llvm#121763)

This will allow vectorizing these calls (after a few more patches). This
should not change the codegen for targets that enable the use of AA
during the codegen (in TargetSubtargetInfo::useAA()). This includes
targets such as AArch64. This notably does not include x86 but can be
worked around by passing -mllvm -combiner-global-alias-analysis=true
to clang.

Follow up to llvm#114086.

[libclc] Improve nextafter behaviour around zero (llvm#127469)

This commit improves the behaviour of (_clc)nextafter around zero.
Specifically, the nextafter value of very small negative numbers in the
positive direction is now negative zero. Previously we'd return positive
zero.

This behaviour is not required as far as OpenCL is concerned: at least,
the CTS isn't testing for it. However, this change does bring our
implementation into bit-equivalence with (libstdc++'s implementation of)
std::nextafter, tested on all possible values of 32-bit float towards
both positive and negative INFINITY.

Furthermore, since the implementation of libclc's floating-point 'rtp'
and 'rtz' conversions use __clc_nextafter, the previous behaviour was
resulting in CTS validation issues. For example, when converting float
-0x1.000002p-25 to half, rounding towards zero or positive infinity,
nextafter was returning +0.0, whereas the correct conversion requires us
to return -0.0.

We could work around this issue in the conversion functions, but since
the change to nextafter is small enough and the behaviour around zero
matches libstdc++, the fix feels at home there.

This commit also converts several variables to unsigned types to avoid
undefined behaviour surrounding signed underflow on the subtractions.
It also converts some variables to be kept in floating-point types, using
fabs to get the absolute value rather than by bit-hacking.

[MLIR][Affine] Fix affine.parallel op verifier (llvm#127611)

Fix affine.parallel op verifier for missing check on zero result lower
or upper bound maps. lb/ub maps should have at least one result.

Fixes: llvm#120186

[flang][OpenMP]Improve support for DECLARE REDUCTION (llvm#127088)

Part of the DECLARE REDUCTION was already supported by the parser, but
the semantics to add the reduction identifier wasn't implemented.

The semantics would not accept the name given by the reduction, so a few
lines added to support that.

Some tests were in place but not quite working, so fixed those up too.
Adding new tests for unparsing and parse-tree, as well as checking the
symbolic name being generated.

Lowering of DECLARE REDUCTION is not supported in this patch, and a test
that it hits the relevant TODO is in this patch (most of this was
already existing, but not actually testing the TODO message).

[AArch64] Add optional extensions enabled on Grace (llvm#127620)

Enable optional ISA extensions on Grace when mcpu=grace
is used: sve2-sm4, sve2-aes, sve2-sha3.
Grace is no longer an alias, but a separate CPU definition.

[lldb] Add LineTable::{upper,lower}_bound (llvm#127519)

The motivation is llvm#123622 and the fact that is hard to fine the last
line entry in a given range. FindLineEntryByAddress(range_end-1) is
the best we have, but it's not ideal because it has a magic -1 and that
it relies on there existing a line entry at that address (generally, it
should be there, but if for some case it isn't, we might end up ignoring
the entries that are there (or -- like my incorrect fix in llvm#123622 did
-- iterating through the entire line table).

What we really want is to get the last entry that exists in the given
range. Or, equivalently (and more STL-like) the first entry after that
range. This is what these functions do. I've used the STL names since
they do pretty much exactly what the standard functions do (the main
head-scratcher comes from the fact that our entries represent ranges
rather than single values).

The functions can also be used to simplify the maze of if statements
in FindLineEntryByAddress, but I'm keeping that as a separate patch.
For now, I'm just adding some unit testing for that function to gain
more confidence that the patch does not change the function behavior.

Co-authored-by: Jonas Devlieghere [email protected]

[NFC][MLIR] Make file-local cl::opt global variables static (llvm#126714)

This is per style-guide: make file-scope symbol static whenever possible.

Fix llvm#125983.

AMDGPU: Avoid double attribute lookup for register count attributes (llvm#127782)

[libclc] Clean up directory search procedure (llvm#127783)

During a recent change, the build system accidentally dropped the
(theoretical) support for the CLC builtins library to build
target-specific builtins from the 'amdgpu' directory, due to a change in
variable names. This functionality wasn't being used but was spotted
during another code review.

This commit takes the opportunity to clean up and better document the
code that manages the list of directories to search for builtin
implementations.

While fixing this, some references to now-removed SOURCES files were
discovered which have been cleaned up.

[NVPTX] Add tcgen05.cp/shift intrinsics (llvm#127669)

This patch adds intrinsics for tcgen05.cp and
tcgen05.shift instructions.

lit tests are added and verified with a
ptxas-12.8 executable.

Docs are updated in the NVPTXUsage.rst file.

Signed-off-by: Durgadoss R [email protected]

[mlir][dlti] Fix query keys preallocation (llvm#127786)

Fixes upfront space allocation after llvm#126716

[libc++] Guard include of <features.h> with __has_include (llvm#127691)

Some configurations define AMDGPU or NVPTX on platforms that
don't provide <features.h>, such as CUDA on Mac.

[clang][bytecode] Fix allocating primitive arrays of unknown bound (llvm#127788)

[libc++] Deprecate and remove member types of hash in <variant> (llvm#127758)

These member types were deprecated in C++17 by P0174R2 and removed in
C++20 by P0619R4, but the changes in <variant> seem missing.

Drive-by: Replace one _NOEXCEPT with noexcept as the hash
specialization is C++17-and-later only.

[NFC] using isUndef() instead of getOpcode() == ISD::UNDEF (llvm#127713)

[NFC] using isUndef() instead of getOpcode() == ISD::UNDEF

[LAA] Make Ptr argument optional in isNoWrap. (llvm#127410)

Update isNoWrap to make the IR Ptr argument optional. This allows using
isNoWrap when dealing with things like pointer-selects, where a select
is translated to multiple pointer SCEV expressions, but there is no IR
value that can be used. We don't try to retrieve pointer values for the
pointer SCEVs and using info from the IR would not be safe. For example,
we cannot use inbounds, because the pointer may never be accessed.

PR: llvm#127410

[MLIR][Linalg] Introduce Python API for linalg.batch_matmul Ops. (llvm#127614)

As linalg.batch_matmul has been moved into tablegen from OpDSL, its
derived python wrapper no longer exist.This patch adds the required
python wrapper.

Also refactors the BatchmatmulOp printer to make it consistent with its
parser.

Revert "[PowerPC] Deprecate uses of ISD::ADDC/ISD::ADDE/ISD::SUBC/ISD::SUBE (llvm#116984)"

This reverts commit 7763119 (leaving the modifications from 03cb46d)..

[clang] Lower modf builtin using llvm.modf intrinsic (llvm#126750)

This updates the existing modf[f|l] builtin to be lowered via the
llvm.modf.* intrinsic (rather than directly to a library call).

flang: Fix build with latest libc++ (llvm#127362)

I think this first stopped working with
9548366. This patch fixes the following
error:

/home/runner/work/llvm-project/llvm-project/flang/runtime/io-api-minimal.cpp:153:11:
error: '__libcpp_verbose_abort' is missing exception specification
'noexcept'
153 | void std::__libcpp_verbose_abort(char const format, ...) {
| ^
| noexcept
/mnt/build/bin/../include/c++/v1/__verbose_abort:30:28: note: previous
declaration is here
30 | printf, 1, 2) void __libcpp_verbose_abort(const char __format,
...) _LIBCPP_VERBOSE_ABORT_NOEXCEPT;
| ^
1 error generated.

[flang] Implement !DIR$ UNROLL_AND_JAM [N] (llvm#125046)

This patch implements support for the UNROLL_AND_JAM directive to enable
or disable unrolling and jamming on a DO LOOP.
It must be placed immediately before a DO LOOP and applies only to the
loop that follows. N is an integer that specifying the unrolling factor.
This is done by adding an attribute to the branch into the loop in LLVM
to indicate that the loop should unrolled and jammed.

[MLIR][NFC] Use base alias for constructor inheritance (llvm#127756)

During my previous cleanup (llvm#127403), I did not notice that we defined a
type alias for the base class. This type alias allows us to use the
shorter form Base::Base, and this PR switches to that.

[libclc][NFC] Reformat ep_log.cl

[clang][Sema] Fix initialization of NonTypeTemplateParmDecl... (llvm#121768)

...when there are invalid constraints.

When attaching a TypeConstraint, in case of error, the trailing
pointer that is supposed to point to the constraint is left
uninitialized.
Sometimes the uninitialized value will be a nullptr, but at other
times it will not. If we traverse the AST (for instance, dumping it, or
when writing the BMI), we may get a crash depending on the value that
was left. The serialization may also contain a bogus value.

In this commit, we always initialize the PlaceholderTypeConstraint
with nullptr, to avoid accessing this uninitialized memory.

This does not affect only modules, but it causes a segfault more
consistently when they are involved.

The test case was reduced from mp-units.

Co-authored-by: Erich Keane [email protected]

[llvm-dwarfdump] Print number of out-of-line functions described by DWARF (llvm#127233)

Some of the functions in #functions may have several inlined
instances, but also an out-of-line definition.

Therefore, for complex enough DWARF input, #functions - #inlined functions would not give us the number of out-of-line function
definitions.
llvm-dwarfdump, however, already keeps track of those; print it as
part of the statistics, as this number is useful in certain scenarios.

[libc] add LLVM_LIBC_CAST macro. (llvm#127319)

related: llvm#127238

This patch adds a macro called LLVM_LIBC_CAST, similar to
__BIONIC_CAST, for type conversion in endian.h.

[Clang] Do not try to transform invalid bindings (llvm#125658)

In the presence of an invalid structured binding decomposition, some
binding packs may be invalid and trying to transform them would produce
a recovery expression that does not contains a pack, leading to
assertions in places where we would expect a pack at that stage.

Fixes llvm#125165

[Clang] Add release note for llvm#127623 (llvm#127815)

While reviewing llvm#127623, I missed that it didn’t have a release note.

[libc++] Avoid code duplication in strings operator+ overloads (llvm#126048)

[Analysis] Avoid repeated hash lookups (NFC) (llvm#127743)

[AsmPrinter] Avoid repeated hash lookups (NFC) (llvm#127744)

[CodeGen] Avoid repeated hash lookups (NFC) (llvm#127745)

[Object] Avoid repeated hash lookups (NFC) (llvm#127746)

[Support] Avoid repeated hash lookups (NFC) (llvm#127747)

[X86] Avoid repeated hash lookups (NFC) (llvm#127748)

[libclc] Explicitly qualify private address spaces (llvm#127823)

Doing so provides stability when compiling the builtins in a mode in
which unqualified pointers may be interpreted as being in the generic
address space, such as in OpenCL 3.0.

We eventually want to provide 'generic' overloads of the builtins in
libclc so this prepares the ground a little better.

It could be argued that having the internal CLC helper functions be
unqualified is more flexible, in case it's better for a target to have
the pointers in the generic address space. This commits to the private
address space for more stability across different OpenCL environments.

[lldb] Make GetOutputStreamSP and GetErrorStreamSP protected (llvm#127682)

This makes GetOutputStreamSP and GetErrorStreamSP protected members of
Debugger. Users who want to print to the debugger's stream should use
GetAsyncOutputStreamSP and GetAsyncErrorStreamSP instead and the few
remaining stragglers have been migrated.

Revert "Reapply [CaptureTracking][FunctionAttrs] Add support for CaptureInfo (llvm#125880)"

This reverts commit 0fab404.
Seems to break LTO builds of clang on Windows, see comments on
llvm#125880

[lldb] Gardening in StreamAsynchronousIO (NFC) (llvm#127717)

A handful of minor improvements to StreamAsynchronousIO:

Document the class.
Use a named enum value to distinguishing between stdout and stderr.
Add missing period to comment.
Clear the string instead of assigning to it.
Eliminate color argument.

[libc++] Optimize ranges::move{,_backward} for vector::iterator (llvm#121109)

As a follow-up to llvm#121013 (which optimized ranges::copy) and llvm#121026
(which optimized ranges::copy_backward), this PR enhances the
performance of std::ranges::{move, move_backward} for
vector<bool>::iterator, addressing a subtask outlined in issue llvm#64038.

The optimizations bring performance improvements analogous to those
achieved for the {copy, copy_backward} algorithms: up to 2000x for
aligned moves and 60x for unaligned moves. Moreover, comprehensive
tests covering up to 4 storage words (256 bytes) with odd and even bit
sizes are provided, which validate the proposed optimizations in this
patch.

[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (llvm#127240)

Previous PR llvm#122950 get
reverted since it hit the buildbot failure. Another patch get merged
when this PR is under review, and thus causing one test not up to date.

repen this PR and fixed the issue.

[flang] AliasAnalysis: Handle fir.load on hlfir.designate (llvm#127107)

For example, determine that the address in obj%p below cannot alias
the address of v:

module m
  type :: ty
    real, pointer :: p
  end type ty
end module m
subroutine test()
  use m
  real, target :: t
  real :: v
  type(ty) :: obj
  obj%p => t
  v = obj%p
end subroutine test

[clang] handle fp options in __builtin_convertvector (llvm#125522)

This patch allows using fpfeatures pragmas with __builtin_convertvector:

added TrailingObjects with FPOptionsOverride and methods for handling
it to ConvertVectorExpr
added support for codegen, node dumping, and serialization of
fpfeatures contained in ConvertVectorExpr

[AMDGPU] Remove unused variables. NFC

[CIR] Add additional frontend actions (llvm#127249)

Add frontend actions to support emitting assembly, bitcode, and object
files when compiling with ClangIR. This change also correctly sets and
propagates the target triple in the MLIR and LLVM modules, which was a
necessary prerequisite for emitting assembly and object files.

[InstCombine] avoid extra instructions in foldSelectICmpAnd (llvm#127398)

Disable fold when it will result in more instructions.

[ELF,test] Remove unneeded -o /dev/null

When the script has executed cd %t, it is fine to to use the output
file a.out.
(We don't want to rely on lit's default PWD to support lit compatible
runners. Therefore -o /dev/null is used when PWD has not been changed
to a %t derived path.)

Remove header file spuriously added by 9905728.

[InstCombine] handle trunc to i1 in foldSelectICmpAndBinOp (llvm#127390)

for trunc nuw saves a instruction and otherwise only other
instructions without the select, same behavior as for bit test before.

proof: https://alive2.llvm.org/ce/z/a6QmyV

[ELF,test] Clean up aarch64-relocs.s

[SLP][NFC]Replace undefs by zeroinitializer

[libc++][test] Refactor tests for std::{copy, move, fill} algorithms (llvm#120909)

This refactor includes the following changes:

Refactor similar tests using types::for_each to remove redundant code;
Explicitly include the missing header type_algorithms.h in some test files;
Some tests scattered in different test functions with ad-hoc names
(e.g., test5(), test6()) but belong to the same kind are now grouped
into one function (test_struct_array()).

Include test folder in the Clang Static Analyzer team mentions (llvm#127810)

See
https://discourse.llvm.org/t/taking-ownership-of-clang-test-analysis/84689

[VPlan] Remove dead exit block handling code in HCFGBuilder.

The mapping of IR ExitBB to a VPBB isn't used. It also sets an incorrect
VPBB for the ExitBB; the regions successor is the middle block, no the
exit block.

It also unnecessarily triggers an assertion after 38376de.

[X86] combineX86ShuffleChain - provide list of combined shuffle nodes, replace HasVariableMask bool arg. NFC. (llvm#127826)

Minor NFC refactor before making better variable mask combining decisions - isTargetShuffleVariableMask doesn't discriminate between fast (AND, PSHUFB etc.) and slow (VPERMV3 etc.) variable shuffles, so an opaque HasVariableMask is only of limited use.

[clangd] Avoid round-trip from SourceLocation to clangd::Range and back in SymbolCollector::handleMacros() (llvm#127757)

[mlir][tosa] Align check variables naming to use capitals (llvm#127830)

Move in using capital letters for variable capture in LIT

Signed-off-by: Georgios Pinitas [email protected]

[Clang] [Sema] Combine fallout warnings to just one warning (llvm#127546)

This merges several falloff and noreturn-related warnings and
removes unused diagnostic arguments.

Changes:

warn_maybe_falloff_nonvoid_function and
warn_falloff_nonvoid_function, warn_maybe_falloff_nonvoid_coroutine
and warn_falloff_nonvoid_coroutine,
warn_maybe_falloff_nonvoid_lambda and warn_falloff_nonvoid_lambda
were combined into warn_falloff_nonvoid,
err_maybe_falloff_nonvoid_block and err_falloff_nonvoid_block were
combined into err_falloff_nonvoid
err_noreturn_block_has_return_expr and
err_noreturn_lambda_has_return_expr were merged into
err_noreturn_has_return_expr with the same semantics as
warn_falloff_nonvoid or err_falloff_nonvoid.
Removed some diagnostic args that weren’t being used by the diagnostics.

[GlobalOpt] Don't query TTI on a llvm.memcpy declaration. (llvm#127760)

Querying TTI creates a Subtarget object, but an llvm.memcpy declaration
doesn't have target-cpu and target-feature attributes like functions
with definitions. This can cause a warning to be printed on RISC-V
because the target-abi in the Module requires floating point, but the
subtarget features don't enable floating point. So far we've only seen
this in LTO when an -mcpu is not supplied for the TargetMachine.

To fix this, get TTI for the calling function instead.

Fixes the issue reported here
llvm#69780 (comment)

[MC] Remove MCRegister::isStackSlot. (llvm#127755)

Stack slots should only be stored in Register. The only caller was
Register::isStackSlot so just inline it there.

[mlir][tosa] Update SelectOp's input names to match TOSA specification (llvm#127833)

Updated:

pred to input1
on_true to input2
on_false to input3

Signed-off-by: Jerry Ge [email protected]

[clangd] Update XRefs to support overridden ObjC methods (llvm#127109)

Support finding implementors of a protocol and discovering subclasses for ObjC interfaces via the implementations call
Support jumping to the overridden method when you trigger goto definition on an override
Properly find references to overridden methods

[MLIR] Fix doc build (NFC)

AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (llvm#127711)

For targets that support IEEE fminimum_num/fmaximum_num, the
corresponding _min_num_fXY/_max_num_fXY instructions themselves
already did the canonicalization for the inputs. As a result, we do not
need to explicitly canonicalize the inputs for fminnum/fmaxnum.

[Modules] Add clang/Lex/HLSLRootSignatureTokenKinds.def to clang's modulemap (llvm#127839)

b41b86a added a new textual header
clang/Lex/HLSLRootSignatureTokenKinds.def but did not add it to
clang's module map. This causes build failure when building llvm with
-DLLVM_ENABLE_MODULES=ON. This PR adds the new textual header to the
module map and fixes the build break.

Fixing rdar://145148093.

[HLSL] Implement the 'and' HLSL function (llvm#127098)

Addresses llvm#125604

Implements and as an HLSL builtin function
The and HLSL builtin function gets lowered to the the LLVM and
instruction

[MLIR][LLVMIR] Add support for asin acos atan intrinsics op (llvm#127317)

This is similar to llvm#125748

[RISCV] Add a pass to remove ADDI by reassociating to fold into load/store address. (llvm#127151)

SelectionDAG will not reassociate adds to the end of a chain if
there are multiple users of later additions. This prevents isel
from folding the immediate into a load/store address.

One easy way to see this is accessing an array in a struct with
two different indices. An ADDI will be used to get to the start
of the array then 2 different SHXADD instructions will be used to
add the scaled indices. Finally the SHXADD will be used by different
load instructions. We can remove the ADDI by folding the offset into
each load.

This patch adds a new pass that analyzes how an ADDI constant
propagates through address arithmetic. If the arithmetic is only
used by a load/store and the offset is small enough, we can adjust
the load/store offset and remove the ADDI.

This pass is placed before MachineCSE to allow cleanups if some
instructions become common after removing offsets from their inputs.

This pass gives ~3% improvement on dynamic instruction count on
541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's
a ~1% improvement on 557.xz_r.

[OpenMP][NFC] Remove unused clock function types and globals (llvm#127684)

[OpenMP][NFC] Remove unused __kmp_dispatch_lock global (llvm#127686)

[clang][perf-training] Support excluding LLVM build from PGO training (llvm#126876)

Using LLVM build itself for PGO training is convenient and a great
starting point but it also has several issues:

LLVM build implicitly depends on tools other than CMake and C/C++
compiler and if those tools aren't available in PATH, the build will
fail.
LLVM build also requires standard headers and libraries which may not
always be available in the default location requiring an explicit
sysroot.
Building a single configuration (-DCMAKE_BUILD_TYPE=Release) only
exercises the -O3 pipeline and can pesimize other configurations.
Building for the host target doesn't exercise all other targets.
Since LLVMSupport is a static library, this doesn't exercise the
linker (beyond what the CMake itself does).

Rather than using LLVM build, ideally we would provide a more minimal,
purpose built corpus. While we're working on building such a corpus,
provide a CMake option that lets vendors disable the use LLVM build for
PGO training.

[MLIR] Fix rewrite of ops with vector operands to LLVM on GPU (llvm#127844)

There was a discrepancy between the type-converter and rewrite-pattern
parts of conversion to LLVM used in various GPU targets, at least ROCDL
and NVVM:

The TypeConverter part was handling vectors of arbitrary rank,
converting them to nests of !llvm.array< ... > with a vector at the
inner-most dimension:
https://github.com/llvm/llvm-project/blob/8337d01e3058e7f47675f5b2b908b4e7821895d7/mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp#L629-L655
The rewrite pattern part was not handling llvm.array:
https://github.com/llvm/llvm-project/blob/8337d01e3058e7f47675f5b2b908b4e7821895d7/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp#L594-L596

That led to conversion failures when lowering math dialect ops on
rank-2 vectors, as in the testcase being added in this PR.

This PR fixes this by reusing a shared utility already used in other
conversions to LLVM:

https://github.com/llvm/llvm-project/blob/8337d01e3058e7f47675f5b2b908b4e7821895d7/mlir/lib/Conversion/LLVMCommon/VectorPattern.cpp#L80-L104

Signed-off-by: Benoit Jacob [email protected]

[SandboxIR] OpaqueValue (llvm#127699)

This patch implements a new subclass of the Value class used for Sandbox
IR Values that we don't support, like metadata or inline asm. The goal
is to never have null sandboxir::Value objects, because this is not the
expected behavior.

Revert "[RISCV] Add a pass to remove ADDI by reassociating to fold into load/store address. (llvm#127151)"

This reverts commit c3ebbfd.

Seeing some test failures on the build bot.

[mlir][Linalg] Fix linalg.generic iteration domain collapse for dynamic dims (llvm#118208)

This pr fixes how iteration domain of linalg.generic is collapsed when
fusing with tensor.expand_shape. Previously, the output_shape for
tensor.expand shape was infered, which doesn't always work except some
special cases.

This patch makes the logic explicitly set the bounds of the new
collapsed iteration domain, because we already know them.

Co-authored-by: Jakub Kuderski [email protected]

Recommit "[RISCV] Add a pass to remove ADDI by reassociating to fold into load/store address. (llvm#127151)"

Tests have been re-generated with recent scheduler changes.

Original message:

SelectionDAG will not reassociate adds to the end of a chain if
there are multiple users of later additions. This prevents isel
from folding the immediate into a load/store address.

One easy way to see this is accessing an array in a struct with
two different indices. An ADDI will be used to get to the start
of the array then 2 different SHXADD instructions will be used to
add the scaled indices. Finally the SHXADD will be used by different
load instructions. We can remove the ADDI by folding the offset into
each load.

This patch adds a new pass that analyzes how an ADDI constant
propagates through address arithmetic. If the arithmetic is only
used by a load/store and the offset is small enough, we can adjust
the load/store offset and remove the ADDI.

This pass is placed before MachineCSE to allow cleanups if some
instructions become common after removing offsets from their inputs.

This pass gives ~3% improvement on dynamic instruction count on
541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's
a ~1% improvement on 557.xz_r.

[gn build] Port 26e3750

[DSE] Make iter order deterministic in removePartiallyOverlappedStores. NFC (llvm#127678)

In removePartiallyOverlappedStores we iterate over
InstOverlapIntervalsTy which is a DenseMap. Change that map into using
MapVector to ensure that we apply the transforms in a deterministic
order. I've only seen that the order matters if starting to use names
for the instructions created when doing the transforms. But such things
are a bit annoying when debugging etc.

[GlobalOpt] Remove Function* argument from tryWidenGlobalArrayAndDests. NFC (llvm#127848)

This is only used to get the Module and the LLVMContext. We can get both
of those from the GlobalVariable*.

[mlir][Vector] Deprecate vector.extractelement/vector.insertelement (llvm#113829)

See
https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops/71116/6
for more information.

[MemProf] Support cloning through recursive cycles (llvm#127429)

In order to facilitate cloning of recursive cycles, we first identify
backedges using a standard DFS search from the root callers, then
initially defer recursively invoking the cloning function via those
edges. This is because the cloning opportunity along the backedge may
not be exposed until the current node is cloned for other non-backedge
callers that are cold after the earlier recursive cloning, resulting
in a cold predecessor of the backedge. So we recursively invoke the
cloning function for the backedges during the cloning of the current
node for its caller edges (which were sorted to enable handling cold
callers first).

There was no significant time or memory overhead measured for several
large applications.

[InstCombine] Test for regession with trunc in foldSelectICmpAnd

[SandboxIR][NFC] Change order of ifs in Context::getOrCreateValueInternal() (llvm#127891)

Move the most common if statement to the top and the least common ones
to the bottom. This should save CPU cycles during compilation.

This patch also prefixes the llvm variables with the LLVM prefix to make
the naming convention in this function more uniform. For example C to
LLVMC.

Check the type of Objective-C++ instance variables in WebKit member variable checkers. (llvm#127570)

Like a C++ member variable, every Objective-C++ instance variable must
be a RefPtr, Ref CheckedPtr, or CheckedRef to an object, not a raw
pointer or reference.

adding rootsignature to obj2yaml

adding test

removing old test

remove useless includes

addressing comments

updating test

removing useless header

fix formating

renaming test

addressing pr comments

adding str to ROOT_ELEMENT_FLAG

formating

refactoring to follow llvm standards

addressing comments

clean up

remove version

fix pr

adding dxil-dis test

adding compatibility test

addressing test concerns

clean up

addressing comments

adding fail test

adding comment

adding few more tests

format

cleanup

adding metadata extraction

moving root signature to it's own pass

formating

removing useless imports

fixing pr changes

adding some asserts

format

fixing assert

cleaning

clean up

addressing comments

removing version

fix test

addressing PR Comments

fix test

filtering root signatures not associated with entry function

separating parsing and validation

improve error handling

clean up

formating

addressing comments and fix tests

formating

addressing pr comments

addressing PR comments

addressing pr comments

removing copies from root signature use in dx container globals

adding more tests

maybe fix test?

try fix format

removing test

adding llvm unreachable and testing test

stopping compilation if root signature error were emitted

making sure Error tests fail

adding root constants

refactoring root signature analysis to return a map instead

addressing pr comments

clean up

addressing pr comments

implementing find interface for RootSignatureAnalysisWrapper

adding test for null function

fix root signature test error

fix other functions are checked

adding missing continue

adding few more tests

adding yaml2obj support

adding support for obj2yaml and initial tests

multiple parameters support and more testing

clean up

fixing formating

reapply rebase fix

clean up

addressing pr comments

first working version

formating

moving the offset logic to it's own class

refactoring to remove use of map and string

addressing comments

using buffer_ostream

remove getsize

clean up

addressing pr comments

clean up

Merge branch 'refactor/improve-offset-calculation' into users/joaosaffran/127840

clean up

moving initializer arround

addressing pr comments

The corresponding feature was implemented in LLVM 18 (by llvm#67799), but this FTM wasn't added before.

It is always safe to add poison-generating flags for `BinOp Y, Identity`. Proof: https://alive2.llvm.org/ce/z/8BLEpq and https://alive2.llvm.org/ce/z/584Bb4 Then we can propagate flags from one of the arms: ``` select Cond, Y, (BinOp flags Y, Z) -> select Cond, (BinOp flags Y, Identity), (BinOp flags Y, Z) -> BinOp flags Y, (select Cond, Identity, Z) ``` This patch is proposed to avoid information loss caused by llvm#127390.

Some check labels in the basic.ll test are simply f[1-3], which matches all cases where the function entry is hot, cold, or uncategorized. Since the actual test results for each label have mixed cases, the current labels can be considered ambiguous. This patch specifies exact matches for each label to ensure more precise validation and to prevent potential regressions in the future.

The `-fcf-protection` flag is now also used to enable CFI features for the RISC-V target, so it's not suitable to define `__CET__` solely based on the flag anymore. This patch moves the definition of the `__CET__` macro into X86 target hook, so only X86 targets with the `-fcf-protection` flag would enable the `__CET__` macro. See llvm#109784 and llvm#112477 for the adoption of `-fcf-protection` flag for RISC-V targets.

This change allows array variables to copy-initialize from other arrays. It also corrects a small error in HLSL C-Style casting that did not error on casting to arrays if elementwise and splat conversions fail. Fixes llvm#127551

) Lack of these entries caused clang to crash on the following code: ```c __m256bh fun(__m256bh arg) { return arg; } __m256bh run() { __m256bh arg= {0}; fun(arg); } ``` It caused the FastISel to fail since it handled the call lowering basing on the X86CallingConv table. Curiously, if FastISel fails somewhere down the line and selectionDAGISel fallbacks, the crash does not occur. Following code _does not_ crash: ```c __m256bh fun(__m256bh arg) { return arg; } __m256bh run() { __m256bh arg= {0}; return fun(arg); } ``` This is puzzling to me. Obviously, if FastISel fails then compiler fallbacks to something else to lower these calls -- but since the X86callingConv table _doesn't_ have entries for vNbf16 how does this other thing manage not to crash? It has to use some other mechanism, one which doesn't use the table. This rises following questions: - how is this lowering accomplished without, presumably, using the CallingConv entries? - why is the table not used? I mean this points to some logic duplication (fastISel way vs. the other bug-free way) - How to properly test this? There is a test for vNbf16 values, but it also must not be using the FastISel path? This duplication of logic makes it hard to test this, since we don't have direct control whether the FastISel path or the other one is used. Nonetheless, this PR fixes the crash, though I didn't create a test for it, since I am unsure yet how it should look like. I would like to learn how the working non-FastISel mechanism works; I tried looking for it, but didn't yet manage to find anything

…amdgpu-sw-lower-lds. (llvm#121214) "infer-address-spaces" pass replaces all refinable generic pointers with equivalent specific pointers. At -O0 optimisation level, infer-address-spaces pass doesn't run in the pipeline. "amdgpu-sw-lower-lds" pass instruments memory operations on addrspace(3) ptrs. Since, extra addrspacecasts are present from lds to flat addrspaces at -O0 and the actual store/load memory instructions are now on flat addrspace, these addrspacecast need to be handled in the amdgpu-sw-lower-lds pass itself. This patch lowers the lds ptr first to the corresponding ptr in the global memory from the asan_malloc. Then replaces the original cast with addrspacecast from global ptr to flat ptr.

…7626) isOperandLegal mostly implemented the VALU operand rules, and largely ignored SALU restrictions. This theoretically avoids folding literals into SALU insts which already have a literal operand. This issue is currently avoided due to a bug in SIFoldOperands; this change will allow using raw operand legality rules. This breaks the formation of s_fmaak_f32 in SIFoldOperands, but it probably should not have been forming there in the first place. TwoAddressInsts or RA should generally handle that, and this only worked by accident.

…n `v_pk_mov_b32` (llvm#127715) This PR serves as a preliminary step, adding a test case for register coalescer on v_pk_mov_b32. It is intended to demonstrate the code changes introduced in an upcoming PR.

…option (llvm#127442) In SPGO lto mode, linker needs -lto-sample-profile option to set sample profile file. Linux adds this option by transferring fprofile-sample-use to -plugin-opt=sample-profile=, which is alias of lto-sample-profile. (in clang\lib\Driver\ToolChains\CommonArgs.cpp: tools::addLTOOptions()). But clang on Windows misses the transferring. So add it now.

For function declarations (i.e. func op has no entry block), the FunctionOpInterface method `insertArgument` and `eraseArgument` will cause segfault. This PR guards against manipulation of empty entry block by checking whether func op is external. An example can be seen in google/heir#1324 The segfault trace ``` #1 0x0000560f1289d9db PrintStackTraceSignalHandler(void*) /proc/self/cwd/external/llvm-project/llvm/lib/Support/Unix/Signals.inc:874:1 llvm#2 0x0000560f1289b116 llvm::sys::RunSignalHandlers() /proc/self/cwd/external/llvm-project/llvm/lib/Support/Signals.cpp:105:5 llvm#3 0x0000560f1289e145 SignalHandler(int) /proc/self/cwd/external/llvm-project/llvm/lib/Support/Unix/Signals.inc:415:1 llvm#4 0x00007f829a3d9520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520) llvm#5 0x0000560f1257f8bc void __gnu_cxx::new_allocator<mlir::BlockArgument>::construct<mlir::BlockArgument, mlir::BlockArgument>(mlir::BlockArgument*, mlir::BlockArgument&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/ext/new_allocator.h:162:23 llvm#6 0x0000560f1257f84d void std::allocator_traits<std::allocator<mlir::BlockArgument> >::construct<mlir::BlockArgument, mlir::BlockArgument>(std::allocator<mlir::BlockArgument>&, mlir::BlockArgument*, mlir::BlockArgument&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:520:2 llvm#7 0x0000560f12580498 void std::vector<mlir::BlockArgument, std::allocator<mlir::BlockArgument> >::_M_insert_aux<mlir::BlockArgument>(__gnu_cxx::__normal_iterator<mlir::BlockArgument*, std::vector<mlir::BlockArgument, std::allocator<mlir::BlockArgument> > >, mlir::BlockArgument&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/vector.tcc:405:7 llvm#8 0x0000560f1257cf7e std::vector<mlir::BlockArgument, std::allocator<mlir::BlockArgument> >::insert(__gnu_cxx::__normal_iterator<mlir::BlockArgument const*, std::vector<mlir::BlockArgument, std::allocator<mlir::BlockArgument> > >, mlir::BlockArgument const&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/vector.tcc:154:6 llvm#9 0x0000560f1257b349 mlir::Block::insertArgument(unsigned int, mlir::Type, mlir::Location) /proc/self/cwd/external/llvm-project/mlir/lib/IR/Block.cpp:178:13 llvm#10 0x0000560f123d2a1c mlir::function_interface_impl::insertFunctionArguments(mlir::FunctionOpInterface, llvm::ArrayRef<unsigned int>, mlir::TypeRange, llvm::ArrayRef<mlir::DictionaryAttr>, llvm::ArrayRef<mlir::Location>, unsigned int, mlir::Type) /proc/self/cwd/external/llvm-project/mlir/lib/Interfaces/FunctionInterfaces.cpp:232:11 llvm#11 0x0000560f0be6b727 mlir::detail::FunctionOpInterfaceTrait<mlir::func::FuncOp>::insertArguments(llvm::ArrayRef<unsigned int>, mlir::TypeRange, llvm::ArrayRef<mlir::DictionaryAttr>, llvm::ArrayRef<mlir::Location>) /proc/self/cwd/bazel-out/k8-dbg/bin/external/llvm-project/mlir/include/mlir/Interfaces/FunctionInterfaces.h.inc:809:7 llvm#12 0x0000560f0be6b536 mlir::detail::FunctionOpInterfaceTrait<mlir::func::FuncOp>::insertArgument(unsigned int, mlir::Type, mlir::DictionaryAttr, mlir::Location) /proc/self/cwd/bazel-out/k8-dbg/bin/external/llvm-project/mlir/include/mlir/Interfaces/FunctionInterfaces.h.inc:796:7 ```

Co-authored-by: Matt Arsenault <[email protected]>

…m#127462)

…llvm#127063) recent change e76739e has exposed use after free in GetCursorSource() function that returned pointer to a disposed CXString

…obind`s `stubgen` (llvm#127584) This PR makes several improvements to the stubs that are created by `mlir_(attribute|type|value)_subclass`. First, the PR sets the `__module__` attribute of the classes generated by the nanobind adaptors for attributes, types, and values (via `mlir_(attribute|type|value)_subclass`). By default, the `__module__` property is set to `importlib._bootstrap`, which isn't where we want the new class to live. The new logic sets the property to the name of the module provided as `scope` instead. This also makes nanobind's `stubgen` generate stubs for those classes properly, which ignores classes whose `__module__` does not correspond to the module it is generating stubs for. This resolves llvm#127518. Second, the PR overwrites the function signatures generated by `stubgen` to a format that uses the desired type names (e.g., `mlir.ir.Attribute` instead of `MlirAttribute`). Finally, the PR piggy-backs some minor doc and style improvements to `PythonAdaptors.h`. --------- Signed-off-by: Ingo Müller <[email protected]>

This patch fixes: clang/tools/c-index-test/c-index-test.c:1240:15: error: mixing declarations and code is a C99 extension [-Werror,-Wdeclaration-after-statement] clang/tools/c-index-test/c-index-test.c:1367:14: error: mixing declarations and code is a C99 extension [-Werror,-Wdeclaration-after-statement] clang/tools/c-index-test/c-index-test.c:1468:14: error: mixing declarations and code is a C99 extension [-Werror,-Wdeclaration-after-statement]

This is a (no-op) locale version of strftime.

…#127640) CaptureTracking considers insertions into aggregates and vectors as captures. As such, extractions from aggregates and vectors are escape sources. A non-escaping identified local cannot alias with the result of an extractvalue/extractelement. Fixes llvm#126670.

Reverts llvm#127708

…27705) This patch adds handling of the RISCVISD::VCPOP_VL node in RISCVTargetLowering::computeKnownBitsForTargetNode. It eliminates redundant zero-extension instructions.

…atization (llvm#127754) Issue: Compilation abnormally terminates in parallel default(private) Documentation reference: A threadprivate variable must not appear as the base variable of a list item in any clause except for the copyin and copyprivate clauses Explanation: From the reference, the threadprivate symbols cannot be used in the DSA clauses, which in turn means, the symbol can be skipped for default privatization Fixes llvm#123535

…lvm#125826) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631 and SWDEV-512633

llvm#125827)

…m#127455) Delete `equivalenceAnalysis`, which has been incorporated into the `getAliasingValues` API. Also add an additional test case to ensure that equivalence is properly propagated across function boundaries.

…5836) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631

…patterns. (llvm#127643) Handles both BWI and non-BWI cases (skips PMOV*XBW without BWI). The vector-interleaved-store-i8-stride-8.ll VPTERNLOG diffs are due to better value tracking now recognizing the zero-extension patterns where before it was any-extension

…fran/127840

joaosaffran · 2025-02-26T22:34:52Z

[libc++] Set feature-test macro __cpp_lib_atomic_float (#127559) #1 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

frederick-vs-ja and others added 30 commits February 19, 2025 09:06

[libc++] Set feature-test macro __cpp_lib_atomic_float (llvm#127559)

2207e3e

The corresponding feature was implemented in LLVM 18 (by llvm#67799), but this FTM wasn't added before.

[lld][WebAssembly] Fix warnings in test. NFC (llvm#127714)

b100c50

[HLSL] Allow arrays to copy-initialize (llvm#127557)

715edd7

This change allows array variables to copy-initialize from other arrays. It also corrects a small error in HLSL C-Style casting that did not error on casting to arrays if elementwise and splat conversions fail. Fixes llvm#127551

[Sparc] Use MCRegister. NFC

27e6561

[NFC][AMDGPU] Pre-commit a test case of checking register coalescer o…

8187caf

…n `v_pk_mov_b32` (llvm#127715) This PR serves as a preliminary step, adding a test case for register coalescer on v_pk_mov_b32. It is intended to demonstrate the code changes introduced in an upcoming PR.

[ELF] Improve INPUT_SECTION_FLAGS test

960b80c

[AMDGPU] Add isAsCheapAsAMove for v_pk_mov_b32 (llvm#127632)

a44284c

Co-authored-by: Matt Arsenault <[email protected]>

[RISCV][VLOPT] Add vfsqrt/vfrsqrt7 instruction to isSupportInstr (llv…

fb39445

…m#127462)

[clang] fix use after free in clang/tools/c-index-test/c-index-test.c (…

1c02c8f

…llvm#127063) recent change e76739e has exposed use after free in GetCursorSource() function that returned pointer to a disposed CXString

[MLIR][emitc]: Remove unused functions (NFC)

c4f8da9

[libc] Add strftime_l (llvm#127708)

9072ba7

This is a (no-op) locale version of strftime.

Revert "[libc] Add strftime_l" (llvm#127766)

a2b4d4e

Reverts llvm#127708

[RISCVISel] Compute leading zeros for RISCVISD::VCPOP_VL node (llvm#1…

b9a1e58

…27705) This patch adds handling of the RISCVISD::VCPOP_VL node in RISCVTargetLowering::computeKnownBitsForTargetNode. It eliminates redundant zero-extension instructions.

[AMDGPU] Replace gfx940 and gfx941 with gfx942 in offload and libclc (l…

a2f9ae1

…lvm#125826) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631 and SWDEV-512633

[AMDGPU] Add missing gfx architectures to AddFlangOffloadRuntime.cmake (

55fb793

llvm#125827)

[AMDGPU][MLIR] Replace gfx940 and gfx941 with gfx942 in MLIR (llvm#12…

8900e41

…5836) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631

joaosaffran added 28 commits February 19, 2025 21:38

fix other functions are checked

4e689e9

adding missing continue

b0d0180

adding few more tests

3c6894f

adding yaml2obj support

08f6ddc

adding support for obj2yaml and initial tests

b232967

multiple parameters support and more testing

1026a8e

clean up

00175bf

fixing formating

9ed2adc

reapply rebase fix

e8252ba

clean up

4de5c29

addressing pr comments

fe13b61

first working version

767b7d0

formating

8434dc2

moving the offset logic to it's own class

d391727

refactoring to remove use of map and string

68c7513

addressing comments

23069ab

using buffer_ostream

d14471b

remove getsize

216341c

clean up

85f012c

clean up

1e2bcf5

clean up

0e277d9

clean up

5cd0044

addressing pr comments

7a7c34d

clean up

d3fafab

Merge branch 'refactor/improve-offset-calculation' into users/joaosaf…

15d1a8c

…fran/127840

clean up

7485640

moving initializer arround

17abc82

addressing pr comments

4b177e2

joaosaffran closed this Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[libc++] Set feature-test macro `__cpp_lib_atomic_float` (#127559) #1

[libc++] Set feature-test macro `__cpp_lib_atomic_float` (#127559) #1

joaosaffran commented Feb 26, 2025

joaosaffran commented Feb 26, 2025

[libc++] Set feature-test macro __cpp_lib_atomic_float (#127559) #1

[libc++] Set feature-test macro __cpp_lib_atomic_float (#127559) #1

Conversation

joaosaffran commented Feb 26, 2025

joaosaffran commented Feb 26, 2025

[libc++] Set feature-test macro `__cpp_lib_atomic_float` (#127559) #1

[libc++] Set feature-test macro `__cpp_lib_atomic_float` (#127559) #1