Skip to content

Bring in upstream update (for main branch)#6

Open
peledins-zimperium wants to merge 10000 commits intomainfrom
bring-in-upstream-update
Open

Bring in upstream update (for main branch)#6
peledins-zimperium wants to merge 10000 commits intomainfrom
bring-in-upstream-update

Conversation

@peledins-zimperium
Copy link
Copy Markdown

@peledins-zimperium peledins-zimperium commented Feb 10, 2026

This PR updates the main branch to its current upstream state; this action should have been performed by the 'Sync fork' button but, for whatever reason it doesn't work. PRs should be avoided for this purpose in the future.

PragmaTwice and others added 30 commits February 9, 2026 13:24
…lvm#180326)

In the MLIR C API headers, clang-tidy’s `modernize-use-using` check
reports a large number of type definitions that use `typedef`. In my
IDE, this even causes the `typedef` code to be shown as struck through.
However, in this case it is clearly not possible to replace them with
`using`. This PR suppresses the `modernize-use-using` check for the code
inside `extern "C"` blocks.
…egator (llvm#178597)

Replace instances of -1ULL, -2ULL, and -3ULL with std::numeric_limits in
Bolt DataAggregator Trace constants to address C4146 compiler warning.

Changes:
- BR_ONLY: -1ULL → std::numeric_limits<uint64_t>::max()
- FT_ONLY: -1ULL → std::numeric_limits<uint64_t>::max()
- FT_EXTERNAL_ORIGIN: -2ULL → std::numeric_limits<uint64_t>::max() - 1
- FT_EXTERNAL_RETURN: -3ULL → std::numeric_limits<uint64_t>::max() - 2

Fixes part of llvm#147439
The `Zvabd` is for `RISC-V Integer Vector Absolute Difference` and
it provides 5 instructions:

* `vabs.v`: Vector Signed Integer Absolute.
* `vabd.vv`: Vector Signed Integer Absolute Difference.
* `vabdu.vv`: Vector Unsigned Integer Absolute Difference.
* `vwabda.vv`: Vector Signed Integer Absolute Difference And Accumulate.
* `vwabdau.vv`: Vector Unsigned Integer Absolute Difference And Accumulate.

Doc: https://github.com/riscv/integer-vector-absolute-difference

Reviewers: topperc, lukel97, preames, tclin914, asb, kito-cheng, mshockwave

Pull Request: llvm#180139
We directly lower `ISD::ABDS`/`ISD::ABDU` to `Zvabd` instructions.

Note that we only support SEW=8/16 for `vabd.vv`/`vabdu.vv`.

Reviewers: mshockwave, lukel97, topperc, preames, tclin914, 4vtomat

Reviewed By: lukel97, topperc

Pull Request: llvm#180141
…llvm#178909)

Currently, Clang only checks arrays and structures for size at a
top-level view, that is it does not consider whether they will fit in
the address space when applying the address space attribute. This can
lead to situations where a variable is declared in an address space but
its type is too large to fit in that address space, leading to
potentially invalid modules.

This patch proposes a fix for this by checking the size of the type
against the maximum size that can be addressed in the given address
space when applying the address space attribute.

This does not currently handle instantiations of dependent variables, as
the attributes are not re-processesd at that time. This is planned for
further investigation and a follow-up patch.

---------

Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>
Co-authored-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>
We add pseudos/patterns for `vabs.v` instruction and handle the
lowering in `RISCVTargetLowering::lowerABS`.

Reviewers: topperc, 4vtomat, mshockwave, preames, lukel97, tclin914

Reviewed By: mshockwave

Pull Request: llvm#180142
Exactly match the s_wait_event instruction. For some reason we already
had this instruction used through llvm.amdgcn.s.wait.event.export.ready,
but that hardcodes a specific value. This should really be a bitmask
that
can combine multiple wait types.

gfx11 -> gfx12 broke compatabilty in a weird way, by inverting the
interpretation of the bit but also shifting the used bit by 1. Simplify
the selection of the old intrinsic by just using the magic number 2,
which should satisfy both cases.
When `Zvabd` exists, `llvm.abs` is lowered to `vabs.v` so the cost
is 1.

Reviewers: mshockwave, topperc, lukel97, skachkov-sc, preames

Reviewed By: topperc

Pull Request: llvm#180146
Currently vector splice intrinsics are costed through getShuffleCost
when the offset is fixed. When the offset is variable though we can't
use a shuffle mask so it currently returns invalid.

This implements the cost in RISCVTTIImpl::getIntrinsicInstrCost as the
cost of a slideup and a slidedown, which matches the codegen.

It also implements the type based cost whenever the offset argument
isn't available.

It may be possible to reduce the cost in future when one of the vector
operands is known to be poison, in which case we only generate a single
slideup or slidedown.
Previously this would just print hex values. Print names for the
recognized values, matching the sp3 syntax.
…g) (llvm#175976)

This PR adds CIR lowering support for predicated SVE `svdup` builtins on
AArch64. The corresponding ACLE intrinsics are documented at:
  https://developer.arm.com/architectures/instruction-sets/intrinsics

This change focuses on the zeroing-predicated variants (suffix `_z`,
e.g. `svdup_n_f32_z`), which lower to the LLVM SVE `dup` intrinsic
with a `zeroinitializer` passthrough operand.

IMPLEMENTATION NOTES
--------------------
* The CIR type converter is extended to support `BuiltinType::SveBool`,
  which is lowered to `cir.vector<[16] x i1>`, matching current Clang
  behaviour and ensuring compatibility with existing LLVM SVE lowering.
* Added logic that converts `cir.vector<[16] x i1>` according to the
  underlying element type. This is done by calling
  `@llvm.aarch64.sve.convert.from.svbool`.

TEST NOTES
----------
Compared to the unpredicated `svdup` tests
(llvm#174433), the new tests
perform more explicit checks to verify:
  * Correct argument usage
  * Correct return value + type

This helped validate differences between the default Clang lowering and
the CIR-based lowering. Once all `svdup` variants are implemented, the tests
will be unified.

EXAMPLE LOWERING
----------------
The following example illustrates that CIR lowering produces equivalent
LLVM IR to the default Clang path.

Input:
```c
svint8_t test_svdup_n_s8(svbool_t pg, int8_t op) {
  return svdup_n_s8_z(pg, op);
}

OUTPUT 1 (default):
```llvm
define dso_local <vscale x 16 x i8> @test(<vscale x 16 x i1> %pg, i8
noundef %op) #0 {
entry:
  %pg.addr = alloca <vscale x 16 x i1>, align 2
  %op.addr = alloca i8, align 1
  store <vscale x 16 x i1> %pg, ptr %pg.addr, align 2
  store i8 %op, ptr %op.addr, align 1
  %0 = load <vscale x 16 x i1>, ptr %pg.addr, align 2
  %1 = load i8, ptr %op.addr, align 1
%2 = call <vscale x 16 x i8> @llvm.aarch64.sve.dup.nxv16i8(<vscale x 16
x i8> zeroinitializer, <vscale x 16 x i1> %0, i8 %1)
  ret <vscale x 16 x i8> %2
}
```

OUTPUT 2 (via `-fclangir`):
```llvm
; Function Attrs: noinline
define dso_local <vscale x 16 x i8> @test(<vscale x 16 x i1> %0, i8 %1)
#0 {
  %3 = alloca <vscale x 16 x i1>, i64 1, align 2
  %4 = alloca i8, i64 1, align 1
  %5 = alloca <vscale x 16 x i8>, i64 1, align 16
  store <vscale x 16 x i1> %0, ptr %3, align 2
  store i8 %1, ptr %4, align 1
  %6 = load <vscale x 16 x i1>, ptr %3, align 2
  %7 = load i8, ptr %4, align 1
%8 = call <vscale x 16 x i8> @llvm.aarch64.sve.dup.nxv16i8(<vscale x 16
x i8> zeroinitializer, <vscale x 16 x i1> %6, i8 %7)
  store <vscale x 16 x i8> %8, ptr %5, align 16
  %9 = load <vscale x 16 x i8>, ptr %5, align 16
  ret <vscale x 16 x i8> %9
}
```
…ext(min/max(x, y)) fold (llvm#180164)

If only of the operands is one-use, the total number of fpexts stays the
same, but the min/max is performed on a narrowed type. Additionally, the
fpext may fold with a following fptrunc.
The partial check lines while claiming UTC output here were
highly confusing. Regenerate the check lines. While here, use a
newer version and rename blocks to avoid anon block conflicts.
…vm#177343)

Load monitor operations make more sense as atomic operations, as
non-atomic operations cannot be used for inter-thread communication w/o
additional synchronization.
The previous built-in made it work because one could just override the
CPol bits, but that bypasses the memory model and forces the user to learn
about ISA bits encoding.

Making load monitor an atomic operation has a couple of advantages.
First, the memory model foundation for it is stronger. We just lean on the
existing rules for atomic operations. Second, the CPol bits are abstracted away
from the user, which avoids leaking ISA details into the API.

This patch also adds supporting memory model and intrinsics
documentation to AMDGPUUsage.

Solves SWDEV-516398.
llvm#179968)

Fold `min/max(fpext x, C)` to `fpext(min/max(x, fptrunc C))` in cases
where the truncation of the constant is lossless.

This helps eliminate fpext/fptrunc pairs around min/max and addresses
the regression from llvm#177988.

Proof: https://alive2.llvm.org/ce/z/y_Bcdd
Should use `nnan` flag only.
…transform pass. (llvm#178134)

This PR covers the `mlir::vector::populateFlattenVectorTransferPatterns`
as a transform pass.
Ran my python script from
llvm#97043 over the repo again and
there were 2 duplicate test-cases that have been introduced since I last
did this.

Also one of the WASM classes had a duplicate method which I just
removed.
…llvm#180278)

Extract `ArraySectionAnalyzer` from `OptimizedBufferization` into a
standalone
analysis utility so it can be reused by other passes (e.g.,
`ScheduleOrderedAssignments`).

Also extracts the logic to detect if a designate is using the indices
of an elemental operation in storage order.

This will be used in WHERE construct optimization in the next patch.
For fixed-length masks we need to AND the result of the whilewr/rw with
`ptrue vl*` (which is at least one more instruction).
This case could be turned into powr or pown, so track which
case ends up preferred.
Add handling for `STT_TLS` (thread-local storage) symbols in the ELF
symbol parsing code. Previously, TLS symbols like `errno` from glibc
were not recognized because `STT_TLS` was not handled in the symbol type
switch statement.

This treats TLS symbols as data symbols (`eSymbolTypeData`), similar to
`STT_OBJECT`. The actual TLS address resolution is already implemented
in `DynamicLoaderPOSIXDYLD::GetThreadLocalData()` which uses the DWARF
`DW_OP_form_tls_address` opcode to calculate thread-local addresses.
)

Drop the custom shrinking code, which we'll also do for intrinsics.
Having libcall-only optimizations is confusing, as these are typically
directly emitted as intrinsics by the frontend.
replace magic value `std::numeric_limits<unsigned>::max()` with a named
constant `ImpossibleRepairCost` to improve readability
…ex (llvm#179699)

The definition for V_INDIRECT_REG_READ_GPR_IDX_B32_V*'s SSrc_b32 operand
allows immediates, but the expansion logic handles only register cases
now. This can result in expansion failures when e.g.
llvm.amdgcn.wave.reduce.umin.i32 is folded into a constant and then used
as an insertelement idx.
If the scalar integer selection sources are freely transferable to the
FPU, then splat to create an allbits select condition and create a
vector select instead
…180300)

The dependency actually appears to be unused.

Co-authored-by: Matt P. Dziubinski <matt-p.dziubinski@hpe.com>
MacDue and others added 23 commits February 10, 2026 12:08
…lvm#180722)

No test because I'm not sure how to reproduce this, but this patch fixes
`CodeGen/ptrauth-qualifier-function.c`.

For function pointer types and function reference types, we use
`Pointer`s these days, so we _can_ return them.
…use a conversion (llvm#179453)

We're currently unwrapping `less<T>` even if the `key_type` isn't `T`.
This causes the removal of an implicit conversion to `const T&` if the
types mismatch. Making `less<T>` transparent in that case changes
overload resolution and makes it fail potentially.

Fixes llvm#179319
…ss space size (llvm#179625)

When a global variable has a size that exceeds the size of the address
space it resides in, the verifier should fail as the variable can
neither be materialized nor fully accessed. This patch adds a check to
the verifier to enforce it.

---------

Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>
Co-authored-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>
…d/vpmaddubsw/pmulhrsw nodes (llvm#180728)

Missing demanded elts handling
This is consistent with all other SPARC test directories.
…de sections (llvm#180411)

When merging `.bss` into a code section (e.g., `/MERGE:.bss=.text`), the
INT3 gap-filling loop in `writeSections()` would write past the output
buffer. This happens because `.bss` chunks have `hasData=false`, so they
contribute to `VirtualSize` but not `SizeOfRawData`. The loop was using
chunk RVAs without checking if they exceeded the raw data region.

This caused a crash on Windows with `/FILEALIGN:1` (access violation
0xC0000005). The tight alignment leaves no slack in the mapped buffer,
so the overflow immediately hits unmapped memory.

The fix bounds all memset operations to `rawSize` and exits early when
encountering chunks beyond the raw data boundary.

Fixes llvm#180406
llvm#180548)

When making a region IsolatedFromAbove, replace uses in any region
within the parent region, not just the immediate parent region.
…fullfp16 cases (llvm#180567)

Noticed while working on some upcoming generic shuffle handling
…s. (llvm#180718)

Conservatively treat unstable pointers as SCEVCouldNotCompute in
getPtrToAddrExpr, and return SCEVUnknown when constructing from IR.

This surfaced as part of the discussion in
llvm#178861.

PR: llvm#180718
Model zext i1 %x to in as select i1 %x, in 1, in 0 in case, if there are
other select instructions, which can be combined into a bundle.

Fixes llvm#178403

Reviewers: hiraditya, RKSimon

Pull Request: llvm#180635
…0689)

urem x, n: result < n (remainder is always less than divisor)
urem x, n: result <= x (remainder is at most the dividend)
udiv x, n: result <= x (quotient is at most the dividend)

https://alive2.llvm.org/ce/z/ezzsjQ
…ddwd/vpmaddubsw/vpmulhrsw vector width reduction (llvm#180738)
…l_gather (llvm#180243)

- fixing incorrect assertion and related function name
- MPI_comm_split is not pure
- simplifying/standardizing permutation in all_gather

---------

Co-authored-by: Rolf Morel <rolfmorel@gmail.com>
Comment on lines +92 to +97
- name: Download source code
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
ref: ${{ matrix.ref }}
repository: ${{ matrix.repo }}
- name: Configure

Check warning

Code scanning / CodeQL

Checkout of untrusted code in trusted context Medium

Potential unsafe checkout of untrusted pull request on privileged workflow.

Copilot Autofix

AI 3 months ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

@jalopezg-git jalopezg-git changed the title Bring in upstream update Bring in upstream update (for main branch) Feb 10, 2026
Copy link
Copy Markdown

@jalopezg-git jalopezg-git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (as in bringing main up to date; this is a fast-forward). Note, however, that no builds happen directly from this branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.