Skip to content

Performance regression with swift-6.2.* #7

@tmcdonell

Description

@tmcdonell

Description

We have a performance regression moving from the 6.1 to 6.2 compiler. My guess is that 6.2 is missing some inlining opportunities, so that at the end of day the core loop is not in a state such that LLVM's loop vectoriser can kick in.

Steps to reproduce

We can reproduce easily on the same machine using docker/podman. Here I am using my M4 Mac and the official swift:6.2.2-noble and swift:6.1.3-noble images. E.g. from the base directory of a fresh checkout:

podman run --rm -it -v $PWD:/(basename $PWD) -w /(basename $PWD) swift:6.1.3-noble

We can filter the benchmarks to one representative that highlights the difference. While not strictly necessary, make sure to disable jemalloc in package-benchmark, as that will distort the results. E.g. in the container just launched:

cd Benchmarks
env BENCHMARK_DISABLE_JEMALLOC=true  swift package benchmark --filter '.*/move/1000000'

Expected behaviour

We should see that the multiarray benchmark is faster when using both 6.1 and 6.2. Instead we see a regression in 6.2; both in overall time for both benchmarks, and specifically for multi array (which no longer vectorises).

Results on my machine:

swift-6.1.3
root@3dca6f6b25c7:/swift-multiarray/Benchmarks# swift --version
Swift version 6.1.3 (swift-6.1.3-RELEASE)
Target: aarch64-unknown-linux-gnu
root@3dca6f6b25c7:/swift-multiarray/Benchmarks# export BENCHMARK_DISABLE_JEMALLOC=true
root@3dca6f6b25c7:/swift-multiarray/Benchmarks# swift package benchmark --filter '.*/move/1000000'
Building for debugging...
[13/13] Linking BenchmarkTool-tool
Build of product 'BenchmarkTool' complete! (2.54s)
Build complete!
Building BenchmarkTool in release mode...
Building benchmark targets in release mode for benchmark run...
Building Benchmarks

==================
Running Benchmarks
==================

100% [------------------------------------------------------------] ETA: 00:00:00 | Benchmarks:array/move/1000000
100% [------------------------------------------------------------] ETA: 00:00:00 | Benchmarks:multiarray/move/1000000

===================================================
Baseline 'Current_run'
===================================================

Host '3dca6f6b25c7' with 16 'aarch64' processors with 58 GB memory, running:
#1 SMP PREEMPT_DYNAMIC Sat Feb  8 20:30:50 UTC 2025

==========
Benchmarks
==========

array/move/1000000
╒════════════════════════════════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╕
│ Metric                             │        p0 │       p25 │       p50 │       p75 │       p90 │       p99 │      p100 │   Samples │
╞════════════════════════════════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╡
│ Memory (allocated resident)        │         0 │         0 │         0 │         0 │         0 │         0 │         0 │     10000 │
├────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│ Time (wall clock) (μs) *           │       804 │       903 │       907 │       914 │       927 │       996 │      1173 │     10000 │
╘════════════════════════════════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╛

multiarray/move/1000000
╒════════════════════════════════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╕
│ Metric                             │        p0 │       p25 │       p50 │       p75 │       p90 │       p99 │      p100 │   Samples │
╞════════════════════════════════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╡
│ Memory (allocated resident)        │         0 │         0 │         0 │         0 │         0 │         0 │         0 │     10000 │
├────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│ Time (wall clock) (μs) *           │       435 │       483 │       489 │       495 │       503 │       540 │       645 │     10000 │
╘════════════════════════════════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╛
swift-6.2.2
root@ed1fec80eca3:/swift-multiarray/Benchmarks# swift --version
Swift version 6.2.2 (swift-6.2.2-RELEASE)
Target: aarch64-unknown-linux-gnu
root@ed1fec80eca3:/swift-multiarray/Benchmarks# export BENCHMARK_DISABLE_JEMALLOC=true
root@ed1fec80eca3:/swift-multiarray/Benchmarks# swift package benchmark --filter '.*/move/1000000'
Building for debugging...
[7/7] Linking BenchmarkTool-tool
Build of product 'BenchmarkTool' complete! (1.24s)
Build complete!
Building BenchmarkTool in release mode...
Building benchmark targets in release mode for benchmark run...
Building Benchmarks

==================
Running Benchmarks
==================

100% [------------------------------------------------------------] ETA: 00:00:00 | Benchmarks:array/move/1000000
100% [------------------------------------------------------------] ETA: 00:00:00 | Benchmarks:multiarray/move/1000000

===================================================
Baseline 'Current_run'
===================================================

Host 'ed1fec80eca3' with 16 'aarch64' processors with 58 GB memory, running:
#1 SMP PREEMPT_DYNAMIC Sat Feb  8 20:30:50 UTC 2025

==========
Benchmarks
==========

array/move/1000000
╒════════════════════════════════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╕
│ Metric                             │        p0 │       p25 │       p50 │       p75 │       p90 │       p99 │      p100 │   Samples │
╞════════════════════════════════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╡
│ Memory (allocated resident)        │         0 │         0 │         0 │         0 │         0 │         0 │         0 │     10000 │
├────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│ Time (wall clock) (μs) *           │       803 │       868 │       879 │       890 │       913 │      1003 │      1226 │     10000 │
╘════════════════════════════════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╛

multiarray/move/1000000
╒════════════════════════════════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╕
│ Metric                             │        p0 │       p25 │       p50 │       p75 │       p90 │       p99 │      p100 │   Samples │
╞════════════════════════════════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╡
│ Memory (allocated resident)        │         0 │         0 │         0 │         0 │         0 │         0 │         0 │      8890 │
├────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│ Time (wall clock) (μs) *           │      1020 │      1109 │      1120 │      1133 │      1158 │      1290 │      1526 │      8890 │
╘════════════════════════════════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╛

Environment

  • M4 Max, macOS 26.0.1
  • podman 5.7.0

Additional information

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions