Bring in upstream update (for `main` branch) by peledins-zimperium · Pull Request #6 · Zimperium/llvm-project

peledins-zimperium · 2026-02-10T15:35:51Z

This PR updates the main branch to its current upstream state; this action should have been performed by the 'Sync fork' button but, for whatever reason it doesn't work. PRs should be avoided for this purpose in the future.

…lvm#180326) In the MLIR C API headers, clang-tidy’s `modernize-use-using` check reports a large number of type definitions that use `typedef`. In my IDE, this even causes the `typedef` code to be shown as struck through. However, in this case it is clearly not possible to replace them with `using`. This PR suppresses the `modernize-use-using` check for the code inside `extern "C"` blocks.

…egator (llvm#178597) Replace instances of -1ULL, -2ULL, and -3ULL with std::numeric_limits in Bolt DataAggregator Trace constants to address C4146 compiler warning. Changes: - BR_ONLY: -1ULL → std::numeric_limits<uint64_t>::max() - FT_ONLY: -1ULL → std::numeric_limits<uint64_t>::max() - FT_EXTERNAL_ORIGIN: -2ULL → std::numeric_limits<uint64_t>::max() - 1 - FT_EXTERNAL_RETURN: -3ULL → std::numeric_limits<uint64_t>::max() - 2 Fixes part of llvm#147439

The `Zvabd` is for `RISC-V Integer Vector Absolute Difference` and it provides 5 instructions: * `vabs.v`: Vector Signed Integer Absolute. * `vabd.vv`: Vector Signed Integer Absolute Difference. * `vabdu.vv`: Vector Unsigned Integer Absolute Difference. * `vwabda.vv`: Vector Signed Integer Absolute Difference And Accumulate. * `vwabdau.vv`: Vector Unsigned Integer Absolute Difference And Accumulate. Doc: https://github.com/riscv/integer-vector-absolute-difference Reviewers: topperc, lukel97, preames, tclin914, asb, kito-cheng, mshockwave Pull Request: llvm#180139

We directly lower `ISD::ABDS`/`ISD::ABDU` to `Zvabd` instructions. Note that we only support SEW=8/16 for `vabd.vv`/`vabdu.vv`. Reviewers: mshockwave, lukel97, topperc, preames, tclin914, 4vtomat Reviewed By: lukel97, topperc Pull Request: llvm#180141

…llvm#178909) Currently, Clang only checks arrays and structures for size at a top-level view, that is it does not consider whether they will fit in the address space when applying the address space attribute. This can lead to situations where a variable is declared in an address space but its type is too large to fit in that address space, leading to potentially invalid modules. This patch proposes a fix for this by checking the size of the type against the maximum size that can be addressed in the given address space when applying the address space attribute. This does not currently handle instantiations of dependent variables, as the attributes are not re-processesd at that time. This is planned for further investigation and a follow-up patch. --------- Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com> Co-authored-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>

We add pseudos/patterns for `vabs.v` instruction and handle the lowering in `RISCVTargetLowering::lowerABS`. Reviewers: topperc, 4vtomat, mshockwave, preames, lukel97, tclin914 Reviewed By: mshockwave Pull Request: llvm#180142

Exactly match the s_wait_event instruction. For some reason we already had this instruction used through llvm.amdgcn.s.wait.event.export.ready, but that hardcodes a specific value. This should really be a bitmask that can combine multiple wait types. gfx11 -> gfx12 broke compatabilty in a weird way, by inverting the interpretation of the bit but also shifting the used bit by 1. Simplify the selection of the old intrinsic by just using the magic number 2, which should satisfy both cases.

When `Zvabd` exists, `llvm.abs` is lowered to `vabs.v` so the cost is 1. Reviewers: mshockwave, topperc, lukel97, skachkov-sc, preames Reviewed By: topperc Pull Request: llvm#180146

Currently vector splice intrinsics are costed through getShuffleCost when the offset is fixed. When the offset is variable though we can't use a shuffle mask so it currently returns invalid. This implements the cost in RISCVTTIImpl::getIntrinsicInstrCost as the cost of a slideup and a slidedown, which matches the codegen. It also implements the type based cost whenever the offset argument isn't available. It may be possible to reduce the cost in future when one of the vector operands is known to be poison, in which case we only generate a single slideup or slidedown.

…80440)

Previously this would just print hex values. Print names for the recognized values, matching the sp3 syntax.

@test

…g) (llvm#175976) This PR adds CIR lowering support for predicated SVE `svdup` builtins on AArch64. The corresponding ACLE intrinsics are documented at: https://developer.arm.com/architectures/instruction-sets/intrinsics This change focuses on the zeroing-predicated variants (suffix `_z`, e.g. `svdup_n_f32_z`), which lower to the LLVM SVE `dup` intrinsic with a `zeroinitializer` passthrough operand. IMPLEMENTATION NOTES -------------------- * The CIR type converter is extended to support `BuiltinType::SveBool`, which is lowered to `cir.vector<[16] x i1>`, matching current Clang behaviour and ensuring compatibility with existing LLVM SVE lowering. * Added logic that converts `cir.vector<[16] x i1>` according to the underlying element type. This is done by calling `@llvm.aarch64.sve.convert.from.svbool`. TEST NOTES ---------- Compared to the unpredicated `svdup` tests (llvm#174433), the new tests perform more explicit checks to verify: * Correct argument usage * Correct return value + type This helped validate differences between the default Clang lowering and the CIR-based lowering. Once all `svdup` variants are implemented, the tests will be unified. EXAMPLE LOWERING ---------------- The following example illustrates that CIR lowering produces equivalent LLVM IR to the default Clang path. Input: ```c svint8_t test_svdup_n_s8(svbool_t pg, int8_t op) { return svdup_n_s8_z(pg, op); } OUTPUT 1 (default): ```llvm define dso_local <vscale x 16 x i8> @test(<vscale x 16 x i1> %pg, i8 noundef %op) #0 { entry: %pg.addr = alloca <vscale x 16 x i1>, align 2 %op.addr = alloca i8, align 1 store <vscale x 16 x i1> %pg, ptr %pg.addr, align 2 store i8 %op, ptr %op.addr, align 1 %0 = load <vscale x 16 x i1>, ptr %pg.addr, align 2 %1 = load i8, ptr %op.addr, align 1 %2 = call <vscale x 16 x i8> @llvm.aarch64.sve.dup.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i1> %0, i8 %1) ret <vscale x 16 x i8> %2 } ``` OUTPUT 2 (via `-fclangir`): ```llvm ; Function Attrs: noinline define dso_local <vscale x 16 x i8> @test(<vscale x 16 x i1> %0, i8 %1) #0 { %3 = alloca <vscale x 16 x i1>, i64 1, align 2 %4 = alloca i8, i64 1, align 1 %5 = alloca <vscale x 16 x i8>, i64 1, align 16 store <vscale x 16 x i1> %0, ptr %3, align 2 store i8 %1, ptr %4, align 1 %6 = load <vscale x 16 x i1>, ptr %3, align 2 %7 = load i8, ptr %4, align 1 %8 = call <vscale x 16 x i8> @llvm.aarch64.sve.dup.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i1> %6, i8 %7) store <vscale x 16 x i8> %8, ptr %5, align 16 %9 = load <vscale x 16 x i8>, ptr %5, align 16 ret <vscale x 16 x i8> %9 } ```

…ext(min/max(x, y)) fold (llvm#180164) If only of the operands is one-use, the total number of fpexts stays the same, but the min/max is performed on a narrowed type. Additionally, the fpext may fold with a following fptrunc.

The partial check lines while claiming UTC output here were highly confusing. Regenerate the check lines. While here, use a newer version and rename blocks to avoid anon block conflicts.

…vm#177343) Load monitor operations make more sense as atomic operations, as non-atomic operations cannot be used for inter-thread communication w/o additional synchronization. The previous built-in made it work because one could just override the CPol bits, but that bypasses the memory model and forces the user to learn about ISA bits encoding. Making load monitor an atomic operation has a couple of advantages. First, the memory model foundation for it is stronger. We just lean on the existing rules for atomic operations. Second, the CPol bits are abstracted away from the user, which avoids leaking ISA details into the API. This patch also adds supporting memory model and intrinsics documentation to AMDGPUUsage. Solves SWDEV-516398.

llvm#179968) Fold `min/max(fpext x, C)` to `fpext(min/max(x, fptrunc C))` in cases where the truncation of the constant is lossless. This helps eliminate fpext/fptrunc pairs around min/max and addresses the regression from llvm#177988. Proof: https://alive2.llvm.org/ce/z/y_Bcdd

Should use `nnan` flag only.

Closes llvm#174370

…transform pass. (llvm#178134) This PR covers the `mlir::vector::populateFlattenVectorTransferPatterns` as a transform pass.

Ran my python script from llvm#97043 over the repo again and there were 2 duplicate test-cases that have been introduced since I last did this. Also one of the WASM classes had a duplicate method which I just removed.

…llvm#180278) Extract `ArraySectionAnalyzer` from `OptimizedBufferization` into a standalone analysis utility so it can be reused by other passes (e.g., `ScheduleOrderedAssignments`). Also extracts the logic to detect if a designate is using the indices of an elemental operation in storage order. This will be used in WHERE construct optimization in the next patch.

For fixed-length masks we need to AND the result of the whilewr/rw with `ptrue vl*` (which is at least one more instruction).

This case could be turned into powr or pown, so track which case ends up preferred.

Add handling for `STT_TLS` (thread-local storage) symbols in the ELF symbol parsing code. Previously, TLS symbols like `errno` from glibc were not recognized because `STT_TLS` was not handled in the symbol type switch statement. This treats TLS symbols as data symbols (`eSymbolTypeData`), similar to `STT_OBJECT`. The actual TLS address resolution is already implemented in `DynamicLoaderPOSIXDYLD::GetThreadLocalData()` which uses the DWARF `DW_OP_form_tls_address` opcode to calculate thread-local addresses.

) Drop the custom shrinking code, which we'll also do for intrinsics. Having libcall-only optimizations is confusing, as these are typically directly emitted as intrinsics by the frontend.

replace magic value `std::numeric_limits<unsigned>::max()` with a named constant `ImpossibleRepairCost` to improve readability

…ex (llvm#179699) The definition for V_INDIRECT_REG_READ_GPR_IDX_B32_V*'s SSrc_b32 operand allows immediates, but the expansion logic handles only register cases now. This can result in expansion failures when e.g. llvm.amdgcn.wave.reduce.umin.i32 is folded into a constant and then used as an insertelement idx.

If the scalar integer selection sources are freely transferable to the FPU, then splat to create an allbits select condition and create a vector select instead

…180300) The dependency actually appears to be unused. Co-authored-by: Matt P. Dziubinski <matt-p.dziubinski@hpe.com>

…lvm#180722) No test because I'm not sure how to reproduce this, but this patch fixes `CodeGen/ptrauth-qualifier-function.c`. For function pointer types and function reference types, we use `Pointer`s these days, so we _can_ return them.

Fixes llvm#173543 (comment)

…use a conversion (llvm#179453) We're currently unwrapping `less<T>` even if the `key_type` isn't `T`. This causes the removal of an implicit conversion to `const T&` if the types mismatch. Making `less<T>` transparent in that case changes overload resolution and makes it fail potentially. Fixes llvm#179319

…ss space size (llvm#179625) When a global variable has a size that exceeds the size of the address space it resides in, the verifier should fail as the variable can neither be materialized nor fully accessed. This patch adds a check to the verifier to enforce it. --------- Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com> Co-authored-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>

RFC: https://discourse.llvm.org/t/how-to-deal-with-abandoned-unmaintained-code/89560

…d/vpmaddubsw/pmulhrsw nodes (llvm#180728) Missing demanded elts handling

This is consistent with all other SPARC test directories.

…de sections (llvm#180411) When merging `.bss` into a code section (e.g., `/MERGE:.bss=.text`), the INT3 gap-filling loop in `writeSections()` would write past the output buffer. This happens because `.bss` chunks have `hasData=false`, so they contribute to `VirtualSize` but not `SizeOfRawData`. The loop was using chunk RVAs without checking if they exceeded the raw data region. This caused a crash on Windows with `/FILEALIGN:1` (access violation 0xC0000005). The tight alignment leaves no slack in the mapped buffer, so the overflow immediately hits unmapped memory. The fix bounds all memset operations to `rawSize` and exits early when encountering chunks beyond the raw data boundary. Fixes llvm#180406

llvm#180548) When making a region IsolatedFromAbove, replace uses in any region within the parent region, not just the immediate parent region.

…fullfp16 cases (llvm#180567) Noticed while working on some upcoming generic shuffle handling

…s. (llvm#180718) Conservatively treat unstable pointers as SCEVCouldNotCompute in getPtrToAddrExpr, and return SCEVUnknown when constructing from IR. This surfaced as part of the discussion in llvm#178861. PR: llvm#180718

Model zext i1 %x to in as select i1 %x, in 1, in 0 in case, if there are other select instructions, which can be combined into a bundle. Fixes llvm#178403 Reviewers: hiraditya, RKSimon Pull Request: llvm#180635

… and mla_mls_merge.ll. NFC

…nt" (llvm#180743) Reverts llvm#180231

…0689) urem x, n: result < n (remainder is always less than divisor) urem x, n: result <= x (remainder is at most the dividend) udiv x, n: result <= x (quotient is at most the dividend) https://alive2.llvm.org/ce/z/ezzsjQ

…ddwd/vpmaddubsw/vpmulhrsw vector width reduction (llvm#180738)

This reverts commit 70aebae to fix buildbots https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flab.llvm.org%2Fbuildbot%2F%23%2Fbuilders%2F85%2Fbuilds%2F18614&data=05%7C02%7C%7Ce5641da3fe984280a6e908de68b3658c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C639063316889757116%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=65hUwLDdZkXq3zUEt3cVuqJNwXN7Alw4JKDggDbjeVk%3D&reserved=0

…l_gather (llvm#180243) - fixing incorrect assertion and related function name - MPI_comm_split is not pure - simplifying/standardizing permutation in all_gather --------- Co-authored-by: Rolf Morel <rolfmorel@gmail.com>

+      - name: Download source code
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
+        with:
+          ref: ${{ matrix.ref }}
+          repository: ${{ matrix.repo }}
+      - name: Configure


jalopezg-git

LGTM (as in bringing main up to date; this is a fast-forward). Note, however, that no builds happen directly from this branch.

PragmaTwice and others added 30 commits February 9, 2026 13:24

[X86] Optimized ADC + ADD to ADC (llvm#176713)

fad32ff

[RISCV][CodeGen] Lower ISD::ABS to Zvabd instructions

972e73b

We add pseudos/patterns for `vabs.v` instruction and handle the lowering in `RISCVTargetLowering::lowerABS`. Reviewers: topperc, 4vtomat, mshockwave, preames, lukel97, tclin914 Reviewed By: mshockwave Pull Request: llvm#180142

[RISCV][TTI] Adjust the cost of llvm.abs intrinsic when Zvabd exists

e16f354

When `Zvabd` exists, `llvm.abs` is lowered to `vabs.v` so the cost is 1. Reviewers: mshockwave, topperc, lukel97, skachkov-sc, preames Reviewed By: topperc Pull Request: llvm#180146

[PredicateInfo] Fix crash on nonnull assume taking a constant (llvm#1…

6c31bf0

…80440)

AMDGPU: Add syntax for s_wait_event values (llvm#180272)

8554ed7

Previously this would just print hex values. Print names for the recognized values, matching the sp3 syntax.

[PhaseOrdering] Regenerate test checks (NFC)

2ead49f

The partial check lines while claiming UTC output here were highly confusing. Regenerate the check lines. While here, use a newer version and rename blocks to avoid anon block conflicts.

[AMDGPU] Remove NoNaNsFPMath uses (llvm#180469)

25315f2

Should use `nnan` flag only.

[GISel] computeKnownBits - add CTLS handling (llvm#178063)

2298b86

Closes llvm#174370

[mlir][vector] Wrapping populateFlattenVectorTransferPatterns as a …

fe91384

…transform pass. (llvm#178134) This PR covers the `mlir::vector::populateFlattenVectorTransferPatterns` as a transform pass.

[AArch64] Tweak fixed-length loop.dependence.mask costs (llvm#175538)

233a991

For fixed-length masks we need to AND the result of the whilewr/rw with `ptrue vl*` (which is at least one more instruction).

AMDGPU: Add a test for libcall simplify pow handling (llvm#180491)

2ffb543

This case could be turned into powr or pown, so track which case ends up preferred.

[SimplifyLibCalls] Directly convert fmin/fmax to intrinsics (llvm#177988

4ef7be9

) Drop the custom shrinking code, which we'll also do for intrinsics. Having libcall-only optimizations is confusing, as these are typically directly emitted as intrinsics by the frontend.

[GlobalISel] Use named constant for impossible repair cost (llvm#180490)

3862a4f

replace magic value `std::numeric_limits<unsigned>::max()` with a named constant `ImpossibleRepairCost` to improve readability

[X86] Allow handling of i128/256/512 SELECT on the FPU (llvm#180197)

964651a

If the scalar integer selection sources are freely transferable to the FPU, then splat to create an allbits select condition and create a vector select instead

[flang][NFC] Remove dependency on FIRBuilder from FIRAnalysis. (llvm#…

5b2bfce

…180300) The dependency actually appears to be unused. Co-authored-by: Matt P. Dziubinski <matt-p.dziubinski@hpe.com>

MacDue and others added 23 commits February 10, 2026 12:08

[IVDesc] Add [[maybe_unused]] to NumNonPHIUsers (NFC) (llvm#180729)

2889098

[X86] Fixed flags issue of onlyZeroFlagUsed (llvm#180405)

6d5bb4d

Fixes llvm#173543 (comment)

[bazel] Port b4032db.

c1a6b13

[mlir][linalg] Remove abandoned Detensorize pass (llvm#177579)

0d37546

RFC: https://discourse.llvm.org/t/how-to-deal-with-abandoned-unmaintained-code/89560

[X86] Add tests showing failure to reduce the vector width of vpmaddw…

dca7b11

…d/vpmaddubsw/pmulhrsw nodes (llvm#180728) Missing demanded elts handling

Rename llvm/test/Transforms/LoopIdiom/Sparc -> /SPARC

b46d6dc

This is consistent with all other SPARC test directories.

[RegionUtils] replace uses in nested regions when isolating from above (

370a571

llvm#180548) When making a region IsolatedFromAbove, replace uses in any region within the parent region, not just the immediate parent region.

[Thumb2] mve-shuffle.ll - add missing check prefix coverage for some …

2f0400c

…fullfp16 cases (llvm#180567) Noticed while working on some upcoming generic shuffle handling

[SLP]Support for zext i1 %x modeling as select %x, 1, 0

70aebae

Model zext i1 %x to in as select i1 %x, in 1, in 0 in case, if there are other select instructions, which can be combined into a bundle. Fixes llvm#178403 Reviewers: hiraditya, RKSimon Pull Request: llvm#180635

[AArch64][GlobalISel] Add test coverage for arm64-neon-2velem-high.ll…

4dc4abc

… and mla_mls_merge.ll. NFC

Revert "Reapply [Offload][lit] Link against SPIR-V DeviceRTL if prese…

6c0ff8d

…nt" (llvm#180743) Reverts llvm#180231

[X86] SimplifyDemandedVectorEltsForTargetNode - add handling for vpma…

7d2e182

…ddwd/vpmaddubsw/vpmulhrsw vector width reduction (llvm#180738)

[bazel] Port a29f0dd.

1e47ccf

[gn] port a29f0dd (llubi)

22c6b70

[bazel] Port a29f0dd, second attempt.

5e0e389

github-advanced-security AI found potential problems Feb 10, 2026

View reviewed changes

peledins-zimperium requested a review from jalopezg-git February 10, 2026 15:50

peledins-zimperium assigned jalopezg-git Feb 10, 2026

jalopezg-git changed the title ~~Bring in upstream update~~ Bring in upstream update (for main branch) Feb 10, 2026

jalopezg-git approved these changes Feb 11, 2026

View reviewed changes

jalopezg-git assigned peledins-zimperium and unassigned jalopezg-git Feb 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bring in upstream update (for `main` branch)#6

Bring in upstream update (for `main` branch)#6
peledins-zimperium wants to merge 10000 commits intomainfrom
bring-in-upstream-update

peledins-zimperium commented Feb 10, 2026 •

edited

Loading

Uh oh!

Check warning

Copilot Autofix

jalopezg-git left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

peledins-zimperium commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Check warning

Copilot Autofix

jalopezg-git left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

peledins-zimperium commented Feb 10, 2026 •

edited

Loading