Skip to content

fix(ci): tighten pre-commit hook to clippy --all-targets#150

Merged
ohdearquant merged 3 commits into
mainfrom
fix/clippy-all-targets-hook
May 31, 2026
Merged

fix(ci): tighten pre-commit hook to clippy --all-targets#150
ohdearquant merged 3 commits into
mainfrom
fix/clippy-all-targets-hook

Conversation

@ohdearquant

Copy link
Copy Markdown
Owner

Summary

This completes #87 — PR #144 cleaned the code but left the hook unchanged.

Test plan

  • cargo clippy --workspace --all-targets -- -D warnings clean
  • Pre-commit hook runs successfully with the new flag

Closes #87

🤖 Generated with Claude Code

Completes #87 — adds --all-targets to the pre-commit hook and fixes
the 15 lints it immediately caught in tests/benches from recently
merged PRs (#129, #130, #133, #146).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented May 31, 2026

Copy link
Copy Markdown

Perf regression report (ADR-058)

aarch64-linux — perf regression report

⚠ 1 WARN (regression 3.0-7.0% confirmed)
🚀 16 confirmed improvement

Bench Δ point 95% CI new ns base ns verdict
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c +5.62% [+5.59%, +5.64%] 1498.7 1498.7 ⚠ WARN
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_256c -3.03% [-3.20%, -2.85%] 28622.2 28622.2 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/768d_256c -3.23% [-3.34%, -3.12%] 21758.9 21758.9 🚀 WIN
simd_throughput_384/normalize -3.69% [-3.70%, -3.67%] 118.7 118.7 🚀 WIN
simd_query_batch_dot_product/pair_loop/768d_256c -3.78% [-3.89%, -3.67%] 21768.9 21768.9 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/768d_256c -4.88% [-4.98%, -4.77%] 28527.5 28527.5 🚀 WIN
simd_query_batch_dot_product/simd_batch/128d_64c -4.96% [-5.02%, -4.90%] 525.2 525.2 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/1024d_256c -4.74% [-5.10%, -4.36%] 35982.1 35982.1 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/1024d_256c -4.94% [-5.19%, -4.68%] 35821.4 35821.4 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/1024d_256c -5.05% [-5.34%, -4.74%] 35941.9 35941.9 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_256c -5.19% [-5.57%, -4.79%] 26857.1 26857.1 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c -5.77% [-5.86%, -5.68%] 28008.5 28008.5 🚀 WIN
simd_query_batch_dot_product/pair_loop/768d_16c -6.44% [-6.49%, -6.39%] 959.3 959.3 🚀 WIN
int8_batch_cosine/int8_loop/1000 -7.52% [-7.70%, -7.33%] 19483.3 19483.3 🚀 WIN
simd_query_batch_dot_product/simd_batch/768d_16c -7.90% [-7.92%, -7.87%] 641.7 641.7 🚀 WIN
simd_batch_cosine/simd_batch/1000 -13.39% [-13.46%, -13.31%] 80376.7 80376.7 🚀 WIN
simd_batch_dot_product/simd_batch/1000 -16.51% [-16.58%, -16.46%] 76823.8 76823.8 🚀 WIN
All 259 measurements
Bench Δ point CI-lower CI-upper
add_bias_gelu/4096 -0.01% -0.03% +0.01%
add_bias_gelu/896 +0.06% +0.02% +0.11%
binary_cosine_distance/binary/1024 -0.58% -0.59% -0.56%
binary_cosine_distance/binary/1536 -0.29% -0.31% -0.28%
binary_cosine_distance/binary/384 -0.41% -0.44% -0.39%
binary_cosine_distance/binary/768 +0.28% +0.26% +0.30%
binary_cosine_distance/float32_simd/1024 +0.28% +0.24% +0.33%
binary_cosine_distance/float32_simd/1536 +0.06% +0.05% +0.08%
binary_cosine_distance/float32_simd/384 +0.11% +0.08% +0.13%
binary_cosine_distance/float32_simd/768 +0.40% +0.38% +0.43%
elementwise_mul/4096 -1.02% -1.06% -0.99%
gelu/4096 -0.00% -0.02% +0.02%
gelu/896 +0.01% -0.01% +0.03%
int4_cosine_distance/float32_simd/1024 +0.77% +0.74% +0.79%
int4_cosine_distance/float32_simd/1536 +0.01% +0.00% +0.02%
int4_cosine_distance/float32_simd/384 +0.50% +0.48% +0.52%
int4_cosine_distance/float32_simd/768 +0.28% +0.27% +0.30%
int4_cosine_distance/int4/1024 +0.25% +0.21% +0.28%
int4_cosine_distance/int4/1536 +0.03% +0.00% +0.05%
int4_cosine_distance/int4/384 +0.22% +0.14% +0.31%
int4_cosine_distance/int4/768 +0.16% +0.10% +0.23%
int8_batch_cosine/float32_simd/10 +0.08% +0.07% +0.08%
int8_batch_cosine/float32_simd/100 -0.32% -0.35% -0.30%
int8_batch_cosine/float32_simd/1000 -2.92% -2.99% -2.86%
int8_batch_cosine/int8_loop/10 +0.21% +0.19% +0.23%
int8_batch_cosine/int8_loop/100 +0.77% +0.75% +0.78%
int8_batch_cosine/int8_loop/1000 -7.52% -7.70% -7.33%
int8_prepared_dot_product/per_call/1024 +0.01% -0.01% +0.04%
int8_prepared_dot_product/per_call/127 -0.00% -0.01% +0.01%
int8_prepared_dot_product/per_call/128 +0.00% -0.00% +0.01%
int8_prepared_dot_product/per_call/129 +0.00% -0.01% +0.01%
int8_prepared_dot_product/per_call/384 +0.02% +0.02% +0.04%
int8_prepared_dot_product/per_call/768 -0.01% -0.02% +0.00%
int8_prepared_dot_product/prepared/1024 -0.26% -0.30% -0.22%
int8_prepared_dot_product/prepared/127 +0.18% +0.15% +0.21%
int8_prepared_dot_product/prepared/128 +0.04% -0.01% +0.09%
int8_prepared_dot_product/prepared/129 +0.78% +0.76% +0.80%
int8_prepared_dot_product/prepared/384 +0.37% +0.34% +0.40%
int8_prepared_dot_product/prepared/768 +0.01% -0.07% +0.08%
int8_quantization/quantize/1024 +0.03% +0.02% +0.04%
int8_quantization/quantize/1536 -0.20% -0.21% -0.20%
int8_quantization/quantize/384 +0.02% +0.01% +0.02%
int8_quantization/quantize/768 +0.02% +0.01% +0.02%
int8_raw_dot_product/dot_product_i8/1024 -0.92% -0.96% -0.88%
int8_raw_dot_product/dot_product_i8/127 +0.09% +0.07% +0.11%
int8_raw_dot_product/dot_product_i8/128 +1.78% +1.71% +1.85%
int8_raw_dot_product/dot_product_i8/129 +0.26% +0.21% +0.31%
int8_raw_dot_product/dot_product_i8/384 +1.13% +1.09% +1.17%
int8_raw_dot_product/dot_product_i8/768 +1.43% +1.38% +1.48%
int8_raw_dot_product/dot_product_i8_raw/1024 -0.08% -0.14% -0.02%
int8_raw_dot_product/dot_product_i8_raw/127 +0.28% +0.25% +0.30%
int8_raw_dot_product/dot_product_i8_raw/128 -0.02% -0.04% +0.00%
int8_raw_dot_product/dot_product_i8_raw/129 +0.38% +0.29% +0.48%
int8_raw_dot_product/dot_product_i8_raw/384 +0.06% +0.04% +0.08%
int8_raw_dot_product/dot_product_i8_raw/768 -0.29% -0.32% -0.25%
int8_vs_float32_cosine/float32_simd/1024 +0.06% +0.05% +0.08%
int8_vs_float32_cosine/float32_simd/1536 +0.11% +0.09% +0.12%
int8_vs_float32_cosine/float32_simd/384 -0.98% -1.02% -0.93%
int8_vs_float32_cosine/float32_simd/768 -0.04% -0.07% -0.02%
int8_vs_float32_cosine/int8/1024 -0.94% -0.99% -0.89%
int8_vs_float32_cosine/int8/1536 -0.15% -0.21% -0.07%
int8_vs_float32_cosine/int8/384 +1.16% +1.09% +1.22%
int8_vs_float32_cosine/int8/768 -2.26% -2.36% -2.16%
layer_norm/4096 -0.38% -0.41% -0.36%
layer_norm/896 -0.06% -0.08% -0.04%
memory_size/search_1000_float32 -0.13% -0.18% -0.09%
memory_size/search_1000_int8 -0.79% -0.82% -0.77%
rms_norm/4096 -0.01% -0.06% +0.04%
rms_norm/896 -0.27% -0.30% -0.25%
silu_inplace/4096 +0.01% +0.00% +0.02%
silu_inplace/896 +0.01% +0.00% +0.03%
simd_batch_cosine/scalar_loop/10 -0.02% -0.03% -0.01%
simd_batch_cosine/scalar_loop/100 +0.07% +0.04% +0.10%
simd_batch_cosine/scalar_loop/1000 -0.68% -0.71% -0.65%
simd_batch_cosine/simd_batch/10 +0.02% +0.01% +0.03%
simd_batch_cosine/simd_batch/100 +0.78% +0.76% +0.80%
simd_batch_cosine/simd_batch/1000 -13.39% -13.46% -13.31%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_1000c +0.31% +0.27% +0.35%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_16c -0.54% -0.56% -0.53%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_256c -5.05% -5.34% -4.74%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_4c -0.05% -0.07% -0.03%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_64c -0.12% -0.13% -0.10%
simd_batch_cosine_non_normalized_query/pair_loop/384d_1000c -0.20% -0.24% -0.15%
simd_batch_cosine_non_normalized_query/pair_loop/384d_16c -0.07% -0.08% -0.06%
simd_batch_cosine_non_normalized_query/pair_loop/384d_256c -0.07% -0.09% -0.06%
simd_batch_cosine_non_normalized_query/pair_loop/384d_4c -0.11% -0.13% -0.09%
simd_batch_cosine_non_normalized_query/pair_loop/384d_64c +0.14% +0.11% +0.17%
simd_batch_cosine_non_normalized_query/pair_loop/768d_1000c +0.76% +0.71% +0.81%
simd_batch_cosine_non_normalized_query/pair_loop/768d_16c -0.27% -0.30% -0.24%
simd_batch_cosine_non_normalized_query/pair_loop/768d_256c -4.88% -4.98% -4.77%
simd_batch_cosine_non_normalized_query/pair_loop/768d_4c -0.13% -0.14% -0.11%
simd_batch_cosine_non_normalized_query/pair_loop/768d_64c -0.32% -0.33% -0.30%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_1000c +0.34% +0.31% +0.37%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_16c -0.23% -0.24% -0.21%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_256c -4.94% -5.19% -4.68%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_4c -0.07% -0.08% -0.06%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_64c -0.17% -0.19% -0.16%
simd_batch_cosine_non_normalized_query/simd_batch/384d_1000c +0.08% +0.05% +0.11%
simd_batch_cosine_non_normalized_query/simd_batch/384d_16c +0.03% +0.02% +0.05%
simd_batch_cosine_non_normalized_query/simd_batch/384d_256c -0.04% -0.07% -0.01%
simd_batch_cosine_non_normalized_query/simd_batch/384d_4c +0.23% +0.22% +0.24%
simd_batch_cosine_non_normalized_query/simd_batch/384d_64c +0.17% +0.16% +0.18%
simd_batch_cosine_non_normalized_query/simd_batch/768d_1000c +1.07% +1.02% +1.12%
simd_batch_cosine_non_normalized_query/simd_batch/768d_16c -0.60% -0.61% -0.58%
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c -5.77% -5.86% -5.68%
simd_batch_cosine_non_normalized_query/simd_batch/768d_4c -0.10% -0.11% -0.09%
simd_batch_cosine_non_normalized_query/simd_batch/768d_64c -0.27% -0.28% -0.25%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_1000c +0.48% +0.43% +0.53%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_16c +0.62% +0.61% +0.63%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_256c -2.98% -3.18% -2.78%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_4c -0.01% -0.02% -0.00%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_64c +0.04% +0.03% +0.05%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_1000c -0.37% -0.41% -0.33%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_16c +0.03% +0.02% +0.05%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_256c +0.13% +0.09% +0.15%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_4c +0.01% -0.00% +0.02%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_64c +0.51% +0.50% +0.52%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_1000c -0.79% -0.83% -0.75%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_16c +0.57% +0.55% +0.58%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_256c -3.03% -3.20% -2.85%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_4c -0.05% -0.05% -0.04%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_64c +0.25% +0.23% +0.26%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_1000c -0.30% -0.37% -0.23%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c +5.62% +5.59% +5.64%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_256c -5.19% -5.57% -4.79%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_4c -0.33% -0.37% -0.30%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_64c +0.02% -0.02% +0.06%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_1000c -0.32% -0.37% -0.26%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_16c -0.67% -0.80% -0.53%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_256c +0.30% +0.28% +0.32%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_4c -0.99% -1.12% -0.86%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_64c +1.87% +1.86% +1.88%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_1000c -0.36% -0.42% -0.31%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c +1.38% +1.34% +1.43%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_256c -3.23% -3.34% -3.12%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_4c +0.18% +0.13% +0.25%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_64c +0.47% +0.45% +0.48%
simd_batch_cosine_normalized_query/simd_batch/1024d_1000c +0.36% +0.32% +0.40%
simd_batch_cosine_normalized_query/simd_batch/1024d_16c +1.08% +1.07% +1.09%
simd_batch_cosine_normalized_query/simd_batch/1024d_256c -4.74% -5.10% -4.36%
simd_batch_cosine_normalized_query/simd_batch/1024d_4c -0.04% -0.04% -0.03%
simd_batch_cosine_normalized_query/simd_batch/1024d_64c +0.07% +0.06% +0.09%
simd_batch_cosine_normalized_query/simd_batch/384d_1000c -0.32% -0.36% -0.28%
simd_batch_cosine_normalized_query/simd_batch/384d_16c +0.04% +0.03% +0.05%
simd_batch_cosine_normalized_query/simd_batch/384d_256c +0.11% +0.09% +0.13%
simd_batch_cosine_normalized_query/simd_batch/384d_4c +0.30% +0.29% +0.31%
simd_batch_cosine_normalized_query/simd_batch/384d_64c +0.33% +0.32% +0.34%
simd_batch_cosine_normalized_query/simd_batch/768d_1000c -0.60% -0.64% -0.56%
simd_batch_cosine_normalized_query/simd_batch/768d_16c +0.76% +0.74% +0.77%
simd_batch_cosine_normalized_query/simd_batch/768d_256c -2.51% -2.61% -2.41%
simd_batch_cosine_normalized_query/simd_batch/768d_4c -0.03% -0.04% -0.03%
simd_batch_cosine_normalized_query/simd_batch/768d_64c +0.39% +0.38% +0.40%
simd_batch_dot_product/scalar_loop/10 -0.01% -0.02% +0.00%
simd_batch_dot_product/scalar_loop/100 +0.02% -0.05% +0.08%
simd_batch_dot_product/scalar_loop/1000 -1.24% -1.31% -1.17%
simd_batch_dot_product/simd_batch/10 +1.36% +1.31% +1.42%
simd_batch_dot_product/simd_batch/100 +1.45% +1.44% +1.47%
simd_batch_dot_product/simd_batch/1000 -16.51% -16.58% -16.46%
simd_cosine_similarity/scalar/1024 +0.03% +0.02% +0.05%
simd_cosine_similarity/scalar/1536 +0.04% +0.03% +0.06%
simd_cosine_similarity/scalar/384 +0.16% +0.11% +0.20%
simd_cosine_similarity/scalar/768 +0.05% +0.03% +0.07%
simd_cosine_similarity/simd/1024 +0.95% +0.91% +1.00%
simd_cosine_similarity/simd/1536 +0.14% +0.12% +0.16%
simd_cosine_similarity/simd/384 +0.54% +0.48% +0.60%
simd_cosine_similarity/simd/768 -0.36% -0.40% -0.31%
simd_dot_product/scalar/1024 -0.00% -0.02% +0.01%
simd_dot_product/scalar/1536 -0.00% -0.01% +0.01%
simd_dot_product/scalar/384 +0.07% +0.06% +0.09%
simd_dot_product/scalar/768 -0.73% -0.74% -0.72%
simd_dot_product/simd/1024 +1.04% +0.97% +1.11%
simd_dot_product/simd/1536 +1.55% +1.46% +1.64%
simd_dot_product/simd/384 -2.11% -2.29% -1.94%
simd_dot_product/simd/768 +0.32% +0.21% +0.45%
simd_euclidean_distance/scalar/1024 +0.03% -0.06% +0.09%
simd_euclidean_distance/scalar/1536 -0.02% -0.03% -0.01%
simd_euclidean_distance/scalar/384 +0.02% -0.03% +0.06%
simd_euclidean_distance/scalar/768 -0.01% -0.06% +0.03%
simd_euclidean_distance/simd/1024 -0.04% -0.09% -0.01%
simd_euclidean_distance/simd/1536 +0.00% -0.03% +0.03%
simd_euclidean_distance/simd/384 +0.34% +0.27% +0.39%
simd_euclidean_distance/simd/768 -0.34% -0.40% -0.30%
simd_normalize/scalar/1024 +0.18% -0.02% +0.38%
simd_normalize/scalar/1536 +0.17% +0.02% +0.32%
simd_normalize/scalar/384 -0.07% -0.50% +0.36%
simd_normalize/scalar/768 +0.28% +0.06% +0.48%
simd_normalize/simd/1024 -2.36% -4.41% -0.31%
simd_normalize/simd/1536 -1.70% -3.07% -0.32%
simd_normalize/simd/384 +0.35% -1.97% +2.62%
simd_normalize/simd/768 +0.10% -1.42% +1.71%
simd_normalized_cosine_fast_path/cosine_full/1024 +0.02% -0.00% +0.04%
simd_normalized_cosine_fast_path/cosine_full/384 +0.02% -0.03% +0.07%
simd_normalized_cosine_fast_path/cosine_full/768 +0.38% +0.35% +0.41%
simd_normalized_cosine_fast_path/dot_product/1024 +1.58% +1.48% +1.68%
simd_normalized_cosine_fast_path/dot_product/384 +1.06% +0.87% +1.28%
simd_normalized_cosine_fast_path/dot_product/768 +3.00% +2.91% +3.10%
simd_prepared_query_normalized_cosine/dot_product_loop/1024 -0.87% -0.95% -0.79%
simd_prepared_query_normalized_cosine/dot_product_loop/384 +0.04% -0.06% +0.15%
simd_prepared_query_normalized_cosine/dot_product_loop/768 -1.01% -1.16% -0.87%
simd_prepared_query_normalized_cosine/prepared_full_cosine/1024 +0.78% +0.42% +1.02%
simd_prepared_query_normalized_cosine/prepared_full_cosine/384 +0.67% +0.64% +0.70%
simd_prepared_query_normalized_cosine/prepared_full_cosine/768 +1.00% +0.93% +1.09%
simd_prepared_query_normalized_cosine/prepared_meta_unit/1024 -1.22% -1.28% -1.15%
simd_prepared_query_normalized_cosine/prepared_meta_unit/384 +0.06% -0.01% +0.13%
simd_prepared_query_normalized_cosine/prepared_meta_unit/768 -0.09% -0.17% -0.01%
simd_query_batch_dot_product/pair_loop/128d_16c +0.48% +0.38% +0.58%
simd_query_batch_dot_product/pair_loop/128d_256c +0.66% +0.51% +0.79%
simd_query_batch_dot_product/pair_loop/128d_4c -0.23% -0.31% -0.16%
simd_query_batch_dot_product/pair_loop/128d_64c -2.38% -2.45% -2.33%
simd_query_batch_dot_product/pair_loop/384d_16c -0.01% -0.07% +0.04%
simd_query_batch_dot_product/pair_loop/384d_256c +0.51% +0.46% +0.55%
simd_query_batch_dot_product/pair_loop/384d_4c +1.26% +1.09% +1.42%
simd_query_batch_dot_product/pair_loop/384d_64c +0.28% +0.22% +0.33%
simd_query_batch_dot_product/pair_loop/768d_16c -6.44% -6.49% -6.39%
simd_query_batch_dot_product/pair_loop/768d_256c -3.78% -3.89% -3.67%
simd_query_batch_dot_product/pair_loop/768d_4c -0.48% -0.65% -0.28%
simd_query_batch_dot_product/pair_loop/768d_64c -0.82% -0.88% -0.78%
simd_query_batch_dot_product/simd_batch/128d_16c +0.23% +0.20% +0.27%
simd_query_batch_dot_product/simd_batch/128d_256c +0.42% +0.36% +0.46%
simd_query_batch_dot_product/simd_batch/128d_4c +0.76% +0.71% +0.81%
simd_query_batch_dot_product/simd_batch/128d_64c -4.96% -5.02% -4.90%
simd_query_batch_dot_product/simd_batch/384d_16c +0.52% +0.50% +0.54%
simd_query_batch_dot_product/simd_batch/384d_256c +1.49% +1.44% +1.54%
simd_query_batch_dot_product/simd_batch/384d_4c +0.49% +0.44% +0.55%
simd_query_batch_dot_product/simd_batch/384d_64c +0.66% +0.53% +0.80%
simd_query_batch_dot_product/simd_batch/768d_16c -7.90% -7.92% -7.87%
simd_query_batch_dot_product/simd_batch/768d_256c +2.66% +2.45% +2.90%
simd_query_batch_dot_product/simd_batch/768d_4c +0.18% +0.16% +0.21%
simd_query_batch_dot_product/simd_batch/768d_64c -0.25% -0.36% -0.15%
simd_squared_euclidean_fast_path/euclidean_full/1024 +0.00% -0.03% +0.04%
simd_squared_euclidean_fast_path/euclidean_full/384 -0.04% -0.10% +0.01%
simd_squared_euclidean_fast_path/euclidean_full/768 +0.04% +0.03% +0.05%
simd_squared_euclidean_fast_path/squared_euclidean/1024 -0.21% -0.23% -0.19%
simd_squared_euclidean_fast_path/squared_euclidean/384 +0.09% +0.05% +0.12%
simd_squared_euclidean_fast_path/squared_euclidean/768 +0.08% +0.06% +0.10%
simd_throughput_384/cosine_similarity +0.06% -0.00% +0.11%
simd_throughput_384/dot_product -0.18% -0.26% -0.11%
simd_throughput_384/euclidean_distance +0.06% +0.03% +0.10%
simd_throughput_384/normalize -3.69% -3.70% -3.67%
softmax_attention/128 +0.03% +0.02% +0.05%
softmax_attention/512 -0.10% -0.22% +0.05%
tier_prepared_batch_sizes/int4_batch_prepared/10 -0.07% -0.11% -0.03%
tier_prepared_batch_sizes/int4_batch_prepared/100 +0.75% +0.70% +0.79%
tier_prepared_batch_sizes/int4_batch_prepared/1000 +0.02% -0.01% +0.05%
tier_prepared_batch_sizes/int4_query_per_call/10 +0.47% +0.46% +0.48%
tier_prepared_batch_sizes/int4_query_per_call/100 +0.40% +0.39% +0.41%
tier_prepared_batch_sizes/int4_query_per_call/1000 +0.44% +0.43% +0.44%
tier_prepared_batch_sizes/int8_batch_prepared/10 +0.11% +0.04% +0.17%
tier_prepared_batch_sizes/int8_batch_prepared/100 -0.84% -0.97% -0.73%
tier_prepared_batch_sizes/int8_batch_prepared/1000 +0.59% +0.55% +0.63%
tier_prepared_batch_sizes/int8_query_per_call/10 +0.02% +0.01% +0.03%
tier_prepared_batch_sizes/int8_query_per_call/100 +0.03% +0.02% +0.04%
tier_prepared_batch_sizes/int8_query_per_call/1000 +0.00% -0.02% +0.02%
tier_prepared_query/binary_query_once_1000 +0.20% +0.18% +0.23%
tier_prepared_query/binary_query_per_call_1000 -0.14% -0.15% -0.13%
tier_prepared_query/int4_query_once_1000 +0.13% +0.08% +0.17%
tier_prepared_query/int4_query_per_call_1000 -0.06% -0.07% -0.05%
tier_prepared_query/int8_query_once_1000 -1.75% -1.78% -1.72%
tier_prepared_query/int8_query_per_call_1000 -0.03% -0.04% -0.02%

Rule: CI-lower of change ≤3.0% passes silently; (3.0%, 7.0%] warns; >7.0% fails. Override via PR label bench-allow-regression.

x86_64-linux — perf regression report

❌ 1 FAIL (regression >7.0% confirmed by 95% CI)
⚠ 1 WARN (regression 3.0-7.0% confirmed)
🚀 257 confirmed improvement

Bench Δ point 95% CI new ns base ns verdict
elementwise_mul/4096 +14.06% [+13.79%, +14.44%] 233.2 233.2 ❌ FAIL
simd_euclidean_distance/simd/1536 +5.49% [+5.27%, +5.65%] 87.5 87.5 ⚠ WARN
simd_dot_product/simd/768 -4.45% [-4.71%, -4.23%] 38.8 38.8 🚀 WIN
simd_euclidean_distance/simd/768 -5.60% [-5.95%, -5.10%] 53.4 53.4 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/384d_16c -5.90% [-6.07%, -5.66%] 442.3 442.3 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/384d_64c -6.87% [-7.06%, -6.70%] 1757.3 1757.3 🚀 WIN
simd_normalize/simd/768 -5.05% [-7.90%, -2.63%] 104.9 104.9 🚀 WIN
simd_normalized_cosine_fast_path/dot_product/384 -8.18% [-8.43%, -7.94%] 25.1 25.1 🚀 WIN
simd_prepared_query_normalized_cosine/dot_product_loop/1024 -8.26% [-8.85%, -7.79%] 67868.6 67868.6 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/384d_256c -8.77% [-8.91%, -8.61%] 6967.5 6967.5 🚀 WIN
layer_norm/896 -8.73% [-9.06%, -8.32%] 155.1 155.1 🚀 WIN
simd_dot_product/simd/384 -8.88% [-9.10%, -8.64%] 25.1 25.1 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/384d_4c -8.94% [-9.49%, -8.62%] 116.0 116.0 🚀 WIN
simd_normalize/simd/384 -7.07% [-9.78%, -4.06%] 64.2 64.2 🚀 WIN
rms_norm/4096 -9.64% [-9.79%, -9.51%] 724.0 724.0 🚀 WIN
simd_normalize/simd/1536 -8.17% [-10.27%, -6.09%] 179.8 179.8 🚀 WIN
simd_euclidean_distance/simd/384 -10.92% [-11.10%, -10.71%] 30.3 30.3 🚀 WIN
simd_throughput_384/dot_product -11.12% [-11.38%, -10.85%] 24.3 24.3 🚀 WIN
simd_throughput_384/euclidean_distance -11.28% [-11.49%, -11.08%] 30.2 30.2 🚀 WIN
simd_throughput_384/normalize -11.16% [-11.51%, -10.82%] 95.7 95.7 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/384d_1000c -11.29% [-11.66%, -10.99%] 28139.8 28139.8 🚀 WIN
int8_vs_float32_cosine/float32_simd/1536 -11.65% [-11.86%, -11.35%] 105.5 105.5 🚀 WIN
simd_normalize/simd/1024 -10.13% [-11.95%, -8.26%] 126.4 126.4 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_64c -11.87% [-12.11%, -11.69%] 2228.2 2228.2 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_1000c -12.17% [-12.32%, -12.05%] 35613.8 35613.8 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_256c -12.32% [-12.58%, -12.12%] 8843.8 8843.8 🚀 WIN
simd_cosine_similarity/simd/768 -13.01% [-13.20%, -12.85%] 60.1 60.1 🚀 WIN
int8_vs_float32_cosine/float32_simd/768 -13.03% [-13.21%, -12.87%] 60.0 60.0 🚀 WIN
int8_vs_float32_cosine/int8/768 -14.21% [-14.46%, -14.02%] 25.1 25.1 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_4c -14.54% [-14.81%, -14.32%] 143.7 143.7 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_16c -15.02% [-15.13%, -14.89%] 549.7 549.7 🚀 WIN
memory_size/search_1000_float32 -16.12% [-16.25%, -16.00%] 33591.6 33591.6 🚀 WIN
simd_prepared_query_normalized_cosine/prepared_full_cosine/384 -16.15% [-16.32%, -16.01%] 34751.7 34751.7 🚀 WIN
simd_batch_cosine/simd_batch/1000 -16.39% [-16.53%, -16.25%] 48191.8 48191.8 🚀 WIN
int8_raw_dot_product/dot_product_i8/768 -16.49% [-16.65%, -16.35%] 22.2 22.2 🚀 WIN
simd_query_batch_dot_product/pair_loop/384d_4c -16.74% [-16.97%, -16.54%] 118.8 118.8 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/384d_64c -16.74% [-16.97%, -16.45%] 2055.9 2055.9 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/384d_1000c -16.73% [-17.04%, -16.33%] 34392.7 34392.7 🚀 WIN
binary_cosine_distance/float32_simd/384 -17.04% [-17.15%, -16.91%] 37.3 37.3 🚀 WIN
simd_prepared_query_normalized_cosine/dot_product_loop/768 -16.96% [-17.17%, -16.83%] 52526.2 52526.2 🚀 WIN
simd_batch_dot_product/simd_batch/1000 -16.93% [-17.21%, -16.68%] 41305.6 41305.6 🚀 WIN
simd_query_batch_dot_product/pair_loop/384d_16c -17.19% [-17.35%, -17.04%] 434.4 434.4 🚀 WIN
simd_query_batch_dot_product/pair_loop/384d_256c -17.28% [-17.40%, -17.20%] 6994.3 6994.3 🚀 WIN
simd_batch_cosine/simd_batch/100 -17.04% [-17.41%, -16.75%] 3580.7 3580.7 🚀 WIN
int8_prepared_dot_product/prepared/768 -17.12% [-17.45%, -16.86%] 22.4 22.4 🚀 WIN
int8_raw_dot_product/dot_product_i8_raw/127 -17.36% [-17.49%, -17.23%] 11.8 11.8 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/384d_64c -17.44% [-17.54%, -17.33%] 2107.2 2107.2 🚀 WIN
simd_query_batch_dot_product/pair_loop/384d_64c -17.38% [-17.56%, -17.22%] 1739.5 1739.5 🚀 WIN
simd_normalized_cosine_fast_path/cosine_full/384 -17.19% [-17.58%, -16.87%] 36.7 36.7 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/384d_256c -17.47% [-17.61%, -17.37%] 8445.5 8445.5 🚀 WIN
simd_prepared_query_normalized_cosine/dot_product_loop/384 -17.08% [-17.62%, -16.64%] 25661.4 25661.4 🚀 WIN
int8_vs_float32_cosine/float32_simd/384 -17.67% [-17.82%, -17.51%] 36.4 36.4 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/384d_1000c -17.44% [-17.84%, -17.12%] 32924.8 32924.8 🚀 WIN
simd_query_batch_dot_product/simd_batch/384d_4c -17.43% [-17.88%, -17.04%] 67.9 67.9 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/384d_256c -17.80% [-17.92%, -17.68%] 8181.8 8181.8 🚀 WIN
simd_throughput_384/cosine_similarity -17.79% [-18.00%, -17.60%] 36.4 36.4 🚀 WIN
int8_batch_cosine/float32_simd/100 -17.62% [-18.00%, -17.32%] 3547.4 3547.4 🚀 WIN
simd_query_batch_dot_product/simd_batch/128d_64c -17.97% [-18.07%, -17.87%] 396.3 396.3 🚀 WIN
int8_batch_cosine/int8_loop/1000 -17.83% [-18.16%, -17.51%] 14944.5 14944.5 🚀 WIN
simd_query_batch_dot_product/simd_batch/384d_16c -18.12% [-18.20%, -18.05%] 224.8 224.8 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_64c -18.17% [-18.26%, -18.06%] 3400.0 3400.0 🚀 WIN
int8_raw_dot_product/dot_product_i8/127 -18.18% [-18.32%, -18.05%] 13.4 13.4 🚀 WIN
int4_cosine_distance/float32_simd/384 -18.09% [-18.46%, -17.78%] 37.6 37.6 🚀 WIN
simd_query_batch_dot_product/simd_batch/768d_4c -18.41% [-18.51%, -18.28%] 114.5 114.5 🚀 WIN
int8_raw_dot_product/dot_product_i8_raw/384 -18.26% [-18.60%, -17.97%] 11.0 11.0 🚀 WIN
int8_prepared_dot_product/prepared/384 -18.39% [-18.61%, -18.11%] 12.9 12.9 🚀 WIN
int8_prepared_dot_product/prepared/1024 -18.55% [-18.82%, -18.32%] 28.5 28.5 🚀 WIN
int8_prepared_dot_product/prepared/127 -18.53% [-18.85%, -18.24%] 13.3 13.3 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/384d_16c -18.66% [-18.85%, -18.51%] 528.4 528.4 🚀 WIN
int4_cosine_distance/float32_simd/768 -18.70% [-18.86%, -18.56%] 56.7 56.7 🚀 WIN
simd_query_batch_dot_product/pair_loop/128d_4c -18.68% [-18.95%, -18.43%] 58.0 58.0 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/384d_4c -18.81% [-18.97%, -18.60%] 134.2 134.2 🚀 WIN
int8_batch_cosine/float32_simd/10 -18.76% [-19.00%, -18.55%] 324.8 324.8 🚀 WIN
int8_batch_cosine/float32_simd/1000 -18.52% [-19.03%, -18.03%] 46874.3 46874.3 🚀 WIN
int8_vs_float32_cosine/float32_simd/1024 -18.95% [-19.05%, -18.85%] 69.5 69.5 🚀 WIN
int8_raw_dot_product/dot_product_i8/384 -18.71% [-19.05%, -18.32%] 12.7 12.7 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_256c -18.90% [-19.19%, -18.64%] 14826.4 14826.4 🚀 WIN
tier_prepared_batch_sizes/int8_batch_prepared/100 -19.08% [-19.25%, -18.83%] 1340.4 1340.4 🚀 WIN
simd_prepared_query_normalized_cosine/prepared_meta_unit/384 -19.19% [-19.29%, -19.09%] 24907.8 24907.8 🚀 WIN
simd_euclidean_distance/simd/1024 -19.40% [-19.51%, -19.28%] 67.8 67.8 🚀 WIN
rms_norm/896 -18.90% [-19.53%, -18.41%] 183.0 183.0 🚀 WIN
int8_batch_cosine/int8_loop/100 -19.41% [-19.70%, -19.14%] 1358.7 1358.7 🚀 WIN
tier_prepared_batch_sizes/int8_batch_prepared/10 -19.53% [-19.79%, -19.22%] 140.5 140.5 🚀 WIN
simd_prepared_query_normalized_cosine/prepared_meta_unit/768 -19.61% [-19.81%, -19.46%] 42984.5 42984.5 🚀 WIN
binary_cosine_distance/float32_simd/768 -19.76% [-19.87%, -19.63%] 55.2 55.2 🚀 WIN
simd_batch_dot_product/simd_batch/10 -19.65% [-19.87%, -19.41%] 243.2 243.2 🚀 WIN
softmax_attention/128 -19.75% [-19.87%, -19.63%] 3848.3 3848.3 🚀 WIN
tier_prepared_batch_sizes/int8_batch_prepared/1000 -19.55% [-19.92%, -19.19%] 13062.7 13062.7 🚀 WIN
memory_size/search_1000_int8 -19.71% [-19.99%, -19.48%] 12575.2 12575.2 🚀 WIN
int4_cosine_distance/float32_simd/1536 -20.01% [-20.15%, -19.85%] 95.6 95.6 🚀 WIN
int8_batch_cosine/int8_loop/10 -19.79% [-20.19%, -19.39%] 135.6 135.6 🚀 WIN
int8_raw_dot_product/dot_product_i8_raw/1024 -19.77% [-20.20%, -19.44%] 24.5 24.5 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/384d_16c -20.01% [-20.24%, -19.76%] 508.2 508.2 🚀 WIN
tier_prepared_query/int8_query_once_1000 -20.26% [-20.59%, -19.84%] 13802.8 13802.8 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_1000c -20.50% [-20.60%, -20.38%] 58163.7 58163.7 🚀 WIN
int8_raw_dot_product/dot_product_i8_raw/768 -20.39% [-20.64%, -20.18%] 19.1 19.1 🚀 WIN
int8_raw_dot_product/dot_product_i8/1024 -20.53% [-20.72%, -20.38%] 27.5 27.5 🚀 WIN
simd_cosine_similarity/simd/1536 -20.74% [-20.82%, -20.68%] 94.6 94.6 🚀 WIN
simd_query_batch_dot_product/simd_batch/128d_16c -20.53% [-20.89%, -20.19%] 97.3 97.3 🚀 WIN
binary_cosine_distance/float32_simd/1536 -20.74% [-20.89%, -20.58%] 95.6 95.6 🚀 WIN
int8_vs_float32_cosine/int8/384 -20.60% [-20.91%, -20.33%] 14.4 14.4 🚀 WIN
simd_batch_dot_product/simd_batch/100 -20.75% [-20.99%, -20.57%] 2893.6 2893.6 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c -20.81% [-20.99%, -20.64%] 843.0 843.0 🚀 WIN
binary_cosine_distance/float32_simd/1024 -20.87% [-21.03%, -20.74%] 68.7 68.7 🚀 WIN
simd_squared_euclidean_fast_path/euclidean_full/384 -20.96% [-21.11%, -20.81%] 30.7 30.7 🚀 WIN
softmax_attention/512 -21.02% [-21.13%, -20.87%] 59702.4 59702.4 🚀 WIN
simd_batch_dot_product/scalar_loop/1000 -21.02% [-21.14%, -20.92%] 293168.9 293168.9 🚀 WIN
simd_batch_dot_product/scalar_loop/100 -21.06% [-21.22%, -20.83%] 28592.6 28592.6 🚀 WIN
simd_query_batch_dot_product/simd_batch/384d_64c -20.75% [-21.26%, -20.35%] 984.3 984.3 🚀 WIN
simd_batch_cosine/scalar_loop/100 -21.29% [-21.33%, -21.26%] 84467.8 84467.8 🚀 WIN
simd_batch_dot_product/scalar_loop/10 -21.22% [-21.39%, -21.07%] 2847.3 2847.3 🚀 WIN
simd_batch_cosine/simd_batch/10 -21.20% [-21.40%, -21.01%] 322.0 322.0 🚀 WIN
simd_batch_cosine/scalar_loop/10 -21.35% [-21.45%, -21.25%] 8458.0 8458.0 🚀 WIN
add_bias_gelu/4096 -21.38% [-21.50%, -21.22%] 1500.4 1500.4 🚀 WIN
simd_query_batch_dot_product/pair_loop/128d_16c -21.42% [-21.60%, -21.22%] 175.2 175.2 🚀 WIN
simd_batch_cosine/scalar_loop/1000 -21.45% [-21.62%, -21.31%] 847416.8 847416.8 🚀 WIN
simd_euclidean_distance/scalar/384 -21.57% [-21.62%, -21.51%] 298.4 298.4 🚀 WIN
int8_vs_float32_cosine/int8/1536 -21.46% [-21.66%, -21.23%] 40.2 40.2 🚀 WIN
simd_cosine_similarity/scalar/384 -21.63% [-21.69%, -21.59%] 852.5 852.5 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/384d_4c -21.51% [-21.73%, -21.28%] 137.1 137.1 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_4c -21.57% [-21.76%, -21.40%] 213.3 213.3 🚀 WIN
simd_normalized_cosine_fast_path/cosine_full/768 -21.35% [-21.97%, -20.84%] 54.7 54.7 🚀 WIN
simd_normalize/scalar/1536 -21.45% [-21.98%, -20.87%] 1389.4 1389.4 🚀 WIN
tier_prepared_batch_sizes/int4_batch_prepared/100 -21.69% [-22.02%, -21.22%] 10306.3 10306.3 🚀 WIN
int8_raw_dot_product/dot_product_i8_raw/128 -21.65% [-22.02%, -21.31%] 5.3 5.3 🚀 WIN
tier_prepared_query/int4_query_per_call_1000 -21.90% [-22.02%, -21.76%] 1899501.5 1899501.5 🚀 WIN
simd_normalized_cosine_fast_path/cosine_full/1024 -21.82% [-22.07%, -21.59%] 67.3 67.3 🚀 WIN
tier_prepared_batch_sizes/int4_query_per_call/1000 -21.93% [-22.07%, -21.77%] 1889947.0 1889947.0 🚀 WIN
tier_prepared_batch_sizes/int4_query_per_call/10 -21.90% [-22.07%, -21.67%] 18924.3 18924.3 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/384d_64c -21.99% [-22.09%, -21.90%] 1970.6 1970.6 🚀 WIN
simd_query_batch_dot_product/simd_batch/384d_256c -21.88% [-22.09%, -21.71%] 3942.2 3942.2 🚀 WIN
simd_normalize/scalar/384 -21.83% [-22.10%, -21.54%] 347.8 347.8 🚀 WIN
int8_vs_float32_cosine/int8/1024 -21.94% [-22.10%, -21.76%] 30.8 30.8 🚀 WIN
int8_raw_dot_product/dot_product_i8_raw/129 -21.85% [-22.14%, -21.62%] 5.7 5.7 🚀 WIN
simd_dot_product/simd/1536 -21.94% [-22.15%, -21.78%] 75.7 75.7 🚀 WIN
gelu/896 -21.97% [-22.18%, -21.78%] 310.8 310.8 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/384d_1000c -22.00% [-22.21%, -21.82%] 31766.3 31766.3 🚀 WIN
simd_euclidean_distance/scalar/768 -22.05% [-22.21%, -21.89%] 612.9 612.9 🚀 WIN
simd_euclidean_distance/scalar/1536 -22.19% [-22.22%, -22.16%] 1240.6 1240.6 🚀 WIN
simd_query_batch_dot_product/simd_batch/768d_256c -21.81% [-22.24%, -21.43%] 7702.0 7702.0 🚀 WIN
tier_prepared_batch_sizes/int4_query_per_call/100 -22.01% [-22.25%, -21.82%] 188882.7 188882.7 🚀 WIN
simd_cosine_similarity/scalar/768 -22.14% [-22.28%, -22.04%] 1794.7 1794.7 🚀 WIN
int8_quantization/quantize/768 -22.02% [-22.31%, -21.65%] 3625.3 3625.3 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_256c -22.13% [-22.32%, -21.90%] 17184.8 17184.8 🚀 WIN
simd_dot_product/scalar/384 -22.22% [-22.32%, -22.10%] 287.9 287.9 🚀 WIN
int8_raw_dot_product/dot_product_i8/129 -21.93% [-22.35%, -21.60%] 7.4 7.4 🚀 WIN
tier_prepared_batch_sizes/int4_batch_prepared/10 -21.92% [-22.36%, -21.57%] 1034.9 1034.9 🚀 WIN
simd_dot_product/scalar/1024 -22.34% [-22.37%, -22.32%] 811.2 811.2 🚀 WIN
simd_cosine_similarity/scalar/1024 -22.25% [-22.44%, -22.13%] 2422.7 2422.7 🚀 WIN
int4_cosine_distance/int4/384 -22.13% [-22.44%, -21.87%] 106.1 106.1 🚀 WIN
binary_cosine_distance/binary/1536 -22.15% [-22.44%, -21.77%] 127.1 127.1 🚀 WIN
int4_cosine_distance/int4/768 -22.24% [-22.44%, -22.05%] 196.3 196.3 🚀 WIN
tier_prepared_query/int8_query_per_call_1000 -22.25% [-22.45%, -22.02%] 1824839.6 1824839.6 🚀 WIN
simd_dot_product/scalar/1536 -22.42% [-22.45%, -22.38%] 1230.0 1230.0 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/384d_256c -22.20% [-22.46%, -21.98%] 7895.7 7895.7 🚀 WIN
tier_prepared_batch_sizes/int4_batch_prepared/1000 -22.12% [-22.47%, -21.87%] 102783.2 102783.2 🚀 WIN
simd_dot_product/scalar/768 -22.44% [-22.47%, -22.39%] 601.7 601.7 🚀 WIN
simd_query_batch_dot_product/pair_loop/128d_256c -21.91% [-22.49%, -21.47%] 2578.5 2578.5 🚀 WIN
int8_prepared_dot_product/per_call/1024 -22.32% [-22.52%, -22.10%] 4834.0 4834.0 🚀 WIN
simd_query_batch_dot_product/simd_batch/128d_256c -22.31% [-22.53%, -22.17%] 1616.3 1616.3 🚀 WIN
simd_squared_euclidean_fast_path/euclidean_full/768 -22.37% [-22.53%, -22.20%] 43.9 43.9 🚀 WIN
int4_cosine_distance/int4/1536 -22.29% [-22.53%, -22.08%] 376.8 376.8 🚀 WIN
add_bias_gelu/896 -22.10% [-22.54%, -21.69%] 326.1 326.1 🚀 WIN
tier_prepared_batch_sizes/int8_query_per_call/100 -22.30% [-22.55%, -22.09%] 181725.0 181725.0 🚀 WIN
int8_prepared_dot_product/per_call/127 -22.27% [-22.57%, -21.92%] 605.3 605.3 🚀 WIN
simd_normalize/scalar/1024 -22.38% [-22.57%, -22.22%] 916.0 916.0 🚀 WIN
simd_normalize/scalar/768 -22.28% [-22.60%, -22.02%] 689.9 689.9 🚀 WIN
int8_prepared_dot_product/per_call/384 -22.51% [-22.60%, -22.41%] 1809.4 1809.4 🚀 WIN
int8_prepared_dot_product/per_call/768 -22.47% [-22.63%, -22.35%] 3620.1 3620.1 🚀 WIN
simd_cosine_similarity/scalar/1536 -22.46% [-22.68%, -22.29%] 3678.8 3678.8 🚀 WIN
binary_cosine_distance/binary/1024 -22.55% [-22.69%, -22.43%] 87.3 87.3 🚀 WIN
simd_dot_product/simd/1024 -22.58% [-22.72%, -22.46%] 51.0 51.0 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_4c -22.64% [-22.77%, -22.48%] 258.9 258.9 🚀 WIN
simd_euclidean_distance/scalar/1024 -22.51% [-22.83%, -22.26%] 821.8 821.8 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_64c -22.68% [-22.85%, -22.52%] 4069.4 4069.4 🚀 WIN
simd_query_batch_dot_product/simd_batch/768d_16c -22.66% [-22.85%, -22.48%] 463.2 463.2 🚀 WIN
tier_prepared_query/binary_query_once_1000 -22.65% [-22.89%, -22.44%] 37511.6 37511.6 🚀 WIN
silu_inplace/896 -22.66% [-22.90%, -22.45%] 2327.9 2327.9 🚀 WIN
int4_cosine_distance/int4/1024 -22.57% [-22.91%, -22.29%] 256.1 256.1 🚀 WIN
int8_prepared_dot_product/per_call/128 -22.63% [-22.91%, -22.40%] 609.4 609.4 🚀 WIN
int8_quantization/quantize/384 -22.62% [-22.92%, -22.37%] 1799.3 1799.3 🚀 WIN
tier_prepared_batch_sizes/int8_query_per_call/1000 -22.37% [-22.94%, -21.90%] 1821339.8 1821339.8 🚀 WIN
int8_quantization/quantize/1536 -22.79% [-23.01%, -22.63%] 7215.9 7215.9 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/384d_16c -22.86% [-23.02%, -22.71%] 493.4 493.4 🚀 WIN
int8_quantization/quantize/1024 -22.69% [-23.04%, -22.42%] 4802.8 4802.8 🚀 WIN
int8_prepared_dot_product/per_call/129 -22.74% [-23.06%, -22.48%] 610.9 610.9 🚀 WIN
gelu/4096 -22.58% [-23.07%, -22.15%] 1422.7 1422.7 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/1024d_4c -22.92% [-23.07%, -22.74%] 255.8 255.8 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_1000c -21.68% [-23.10%, -19.84%] 67785.3 67785.3 🚀 WIN
simd_squared_euclidean_fast_path/squared_euclidean/1024 -22.97% [-23.10%, -22.83%] 51.7 51.7 🚀 WIN
simd_query_batch_dot_product/pair_loop/128d_64c -22.89% [-23.12%, -22.65%] 645.6 645.6 🚀 WIN
tier_prepared_batch_sizes/int8_query_per_call/10 -22.73% [-23.20%, -22.28%] 18262.1 18262.1 🚀 WIN
tier_prepared_query/binary_query_per_call_1000 -22.97% [-23.21%, -22.71%] 690536.4 690536.4 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_16c -22.86% [-23.25%, -22.37%] 1021.9 1021.9 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/1024d_256c -23.09% [-23.31%, -22.87%] 16819.0 16819.0 🚀 WIN
simd_query_batch_dot_product/simd_batch/768d_64c -23.36% [-23.45%, -23.25%] 1879.3 1879.3 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/1024d_64c -23.35% [-23.51%, -23.25%] 3993.6 3993.6 🚀 WIN
simd_squared_euclidean_fast_path/squared_euclidean/768 -23.33% [-23.53%, -23.15%] 39.9 39.9 🚀 WIN
tier_prepared_query/int4_query_once_1000 -22.83% [-23.54%, -22.32%] 102550.5 102550.5 🚀 WIN
simd_cosine_similarity/simd/1024 -23.42% [-23.56%, -23.29%] 69.6 69.6 🚀 WIN
binary_cosine_distance/binary/768 -23.12% [-23.64%, -22.69%] 67.6 67.6 🚀 WIN
simd_squared_euclidean_fast_path/squared_euclidean/384 -22.61% [-23.66%, -21.72%] 26.7 26.7 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/1024d_16c -23.54% [-23.71%, -23.42%] 1002.1 1002.1 🚀 WIN
simd_squared_euclidean_fast_path/euclidean_full/1024 -23.56% [-23.73%, -23.44%] 54.9 54.9 🚀 WIN
silu_inplace/4096 -23.25% [-23.95%, -22.62%] 10678.7 10678.7 🚀 WIN
binary_cosine_distance/binary/384 -23.19% [-24.08%, -22.55%] 38.2 38.2 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_4c -23.94% [-24.23%, -23.71%] 208.2 208.2 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/768d_4c -24.27% [-24.40%, -24.16%] 204.9 204.9 🚀 WIN
simd_query_batch_dot_product/simd_batch/128d_4c -23.96% [-24.44%, -23.50%] 35.8 35.8 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/1024d_1000c -24.31% [-24.45%, -24.13%] 64800.0 64800.0 🚀 WIN
simd_normalized_cosine_fast_path/dot_product/768 -24.26% [-24.49%, -24.07%] 38.8 38.8 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/1024d_256c -24.24% [-24.59%, -23.84%] 16936.3 16936.3 🚀 WIN
simd_prepared_query_normalized_cosine/prepared_full_cosine/768 -24.45% [-24.64%, -24.32%] 52172.0 52172.0 🚀 WIN
simd_cosine_similarity/simd/384 -24.50% [-24.66%, -24.35%] 33.2 33.2 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/384d_4c -24.38% [-24.74%, -24.07%] 129.8 129.8 🚀 WIN
int8_prepared_dot_product/prepared/129 -24.67% [-24.95%, -24.30%] 7.2 7.2 🚀 WIN
simd_normalized_cosine_fast_path/dot_product/1024 -24.08% [-24.98%, -23.33%] 50.9 50.9 🚀 WIN
int4_cosine_distance/float32_simd/1024 -24.87% [-25.17%, -24.63%] 65.1 65.1 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/1024d_4c -25.26% [-25.36%, -25.13%] 261.4 261.4 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_16c -25.31% [-25.49%, -25.10%] 808.1 808.1 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/768d_4c -25.41% [-25.57%, -25.28%] 178.1 178.1 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/1024d_1000c -25.50% [-25.69%, -25.34%] 66551.0 66551.0 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_1000c -25.55% [-25.71%, -25.37%] 52011.4 52011.4 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_256c -25.68% [-25.87%, -25.55%] 13139.0 13139.0 🚀 WIN
int8_raw_dot_product/dot_product_i8/128 -25.70% [-25.91%, -25.50%] 6.7 6.7 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/1024d_16c -25.72% [-25.94%, -25.50%] 1020.3 1020.3 🚀 WIN
simd_prepared_query_normalized_cosine/prepared_full_cosine/1024 -25.70% [-25.97%, -25.50%] 67019.1 67019.1 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/768d_64c -25.75% [-25.97%, -25.59%] 3224.0 3224.0 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/1024d_16c -25.89% [-26.04%, -25.78%] 1008.7 1008.7 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/768d_1000c -25.87% [-26.12%, -25.59%] 52465.5 52465.5 🚀 WIN
int8_prepared_dot_product/prepared/128 -25.86% [-26.20%, -25.59%] 6.7 6.7 🚀 WIN
simd_prepared_query_normalized_cosine/prepared_meta_unit/1024 -26.03% [-26.22%, -25.85%] 57162.0 57162.0 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/768d_4c -26.12% [-26.23%, -26.00%] 205.7 205.7 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_64c -25.04% [-26.33%, -23.29%] 3268.3 3268.3 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/768d_16c -26.21% [-26.38%, -26.04%] 801.1 801.1 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c -26.28% [-26.48%, -26.09%] 12871.5 12871.5 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/768d_16c -26.60% [-26.74%, -26.43%] 788.4 788.4 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/768d_256c -26.22% [-26.76%, -25.82%] 13114.3 13114.3 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/768d_256c -26.59% [-26.78%, -26.39%] 12778.3 12778.3 🚀 WIN
simd_query_batch_dot_product/pair_loop/768d_4c -26.63% [-26.80%, -26.48%] 175.3 175.3 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/1024d_1000c -26.16% [-26.88%, -25.28%] 65526.8 65526.8 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/768d_1000c -26.89% [-26.97%, -26.81%] 50561.3 50561.3 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/1024d_4c -26.61% [-26.97%, -26.29%] 255.5 255.5 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/768d_16c -26.85% [-26.97%, -26.71%] 785.7 785.7 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/768d_1000c -26.87% [-27.03%, -26.72%] 50674.8 50674.8 🚀 WIN
simd_batch_cosine_normalized_query/simd_batch/768d_64c -26.91% [-27.08%, -26.78%] 3135.7 3135.7 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/1024d_256c -25.52% [-27.11%, -24.22%] 17226.5 17226.5 🚀 WIN
simd_batch_cosine_non_normalized_query/pair_loop/1024d_64c -26.49% [-27.21%, -25.95%] 4037.9 4037.9 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/768d_4c -27.18% [-27.38%, -27.00%] 201.0 201.0 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/768d_64c -27.39% [-27.78%, -27.08%] 3140.9 3140.9 🚀 WIN
simd_batch_cosine_non_normalized_query/simd_batch/1024d_64c -27.20% [-27.85%, -26.69%] 3976.4 3976.4 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/768d_1000c -28.03% [-28.24%, -27.80%] 44534.9 44534.9 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/768d_256c -28.35% [-28.41%, -28.29%] 11044.4 11044.4 🚀 WIN
simd_query_batch_dot_product/pair_loop/768d_256c -28.26% [-28.41%, -28.15%] 11004.1 11004.1 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c -28.80% [-28.87%, -28.74%] 677.7 677.7 🚀 WIN
simd_batch_cosine_normalized_query/pair_loop_dot/768d_64c -28.86% [-29.06%, -28.72%] 2695.5 2695.5 🚀 WIN
simd_query_batch_dot_product/pair_loop/768d_16c -28.89% [-29.07%, -28.75%] 676.5 676.5 🚀 WIN
simd_query_batch_dot_product/pair_loop/768d_64c -29.52% [-29.59%, -29.46%] 2697.0 2697.0 🚀 WIN
layer_norm/4096 -31.21% [-31.54%, -30.96%] 632.4 632.4 🚀 WIN
All 259 measurements
Bench Δ point CI-lower CI-upper
add_bias_gelu/4096 -21.38% -21.50% -21.22%
add_bias_gelu/896 -22.10% -22.54% -21.69%
binary_cosine_distance/binary/1024 -22.55% -22.69% -22.43%
binary_cosine_distance/binary/1536 -22.15% -22.44% -21.77%
binary_cosine_distance/binary/384 -23.19% -24.08% -22.55%
binary_cosine_distance/binary/768 -23.12% -23.64% -22.69%
binary_cosine_distance/float32_simd/1024 -20.87% -21.03% -20.74%
binary_cosine_distance/float32_simd/1536 -20.74% -20.89% -20.58%
binary_cosine_distance/float32_simd/384 -17.04% -17.15% -16.91%
binary_cosine_distance/float32_simd/768 -19.76% -19.87% -19.63%
elementwise_mul/4096 +14.06% +13.79% +14.44%
gelu/4096 -22.58% -23.07% -22.15%
gelu/896 -21.97% -22.18% -21.78%
int4_cosine_distance/float32_simd/1024 -24.87% -25.17% -24.63%
int4_cosine_distance/float32_simd/1536 -20.01% -20.15% -19.85%
int4_cosine_distance/float32_simd/384 -18.09% -18.46% -17.78%
int4_cosine_distance/float32_simd/768 -18.70% -18.86% -18.56%
int4_cosine_distance/int4/1024 -22.57% -22.91% -22.29%
int4_cosine_distance/int4/1536 -22.29% -22.53% -22.08%
int4_cosine_distance/int4/384 -22.13% -22.44% -21.87%
int4_cosine_distance/int4/768 -22.24% -22.44% -22.05%
int8_batch_cosine/float32_simd/10 -18.76% -19.00% -18.55%
int8_batch_cosine/float32_simd/100 -17.62% -18.00% -17.32%
int8_batch_cosine/float32_simd/1000 -18.52% -19.03% -18.03%
int8_batch_cosine/int8_loop/10 -19.79% -20.19% -19.39%
int8_batch_cosine/int8_loop/100 -19.41% -19.70% -19.14%
int8_batch_cosine/int8_loop/1000 -17.83% -18.16% -17.51%
int8_prepared_dot_product/per_call/1024 -22.32% -22.52% -22.10%
int8_prepared_dot_product/per_call/127 -22.27% -22.57% -21.92%
int8_prepared_dot_product/per_call/128 -22.63% -22.91% -22.40%
int8_prepared_dot_product/per_call/129 -22.74% -23.06% -22.48%
int8_prepared_dot_product/per_call/384 -22.51% -22.60% -22.41%
int8_prepared_dot_product/per_call/768 -22.47% -22.63% -22.35%
int8_prepared_dot_product/prepared/1024 -18.55% -18.82% -18.32%
int8_prepared_dot_product/prepared/127 -18.53% -18.85% -18.24%
int8_prepared_dot_product/prepared/128 -25.86% -26.20% -25.59%
int8_prepared_dot_product/prepared/129 -24.67% -24.95% -24.30%
int8_prepared_dot_product/prepared/384 -18.39% -18.61% -18.11%
int8_prepared_dot_product/prepared/768 -17.12% -17.45% -16.86%
int8_quantization/quantize/1024 -22.69% -23.04% -22.42%
int8_quantization/quantize/1536 -22.79% -23.01% -22.63%
int8_quantization/quantize/384 -22.62% -22.92% -22.37%
int8_quantization/quantize/768 -22.02% -22.31% -21.65%
int8_raw_dot_product/dot_product_i8/1024 -20.53% -20.72% -20.38%
int8_raw_dot_product/dot_product_i8/127 -18.18% -18.32% -18.05%
int8_raw_dot_product/dot_product_i8/128 -25.70% -25.91% -25.50%
int8_raw_dot_product/dot_product_i8/129 -21.93% -22.35% -21.60%
int8_raw_dot_product/dot_product_i8/384 -18.71% -19.05% -18.32%
int8_raw_dot_product/dot_product_i8/768 -16.49% -16.65% -16.35%
int8_raw_dot_product/dot_product_i8_raw/1024 -19.77% -20.20% -19.44%
int8_raw_dot_product/dot_product_i8_raw/127 -17.36% -17.49% -17.23%
int8_raw_dot_product/dot_product_i8_raw/128 -21.65% -22.02% -21.31%
int8_raw_dot_product/dot_product_i8_raw/129 -21.85% -22.14% -21.62%
int8_raw_dot_product/dot_product_i8_raw/384 -18.26% -18.60% -17.97%
int8_raw_dot_product/dot_product_i8_raw/768 -20.39% -20.64% -20.18%
int8_vs_float32_cosine/float32_simd/1024 -18.95% -19.05% -18.85%
int8_vs_float32_cosine/float32_simd/1536 -11.65% -11.86% -11.35%
int8_vs_float32_cosine/float32_simd/384 -17.67% -17.82% -17.51%
int8_vs_float32_cosine/float32_simd/768 -13.03% -13.21% -12.87%
int8_vs_float32_cosine/int8/1024 -21.94% -22.10% -21.76%
int8_vs_float32_cosine/int8/1536 -21.46% -21.66% -21.23%
int8_vs_float32_cosine/int8/384 -20.60% -20.91% -20.33%
int8_vs_float32_cosine/int8/768 -14.21% -14.46% -14.02%
layer_norm/4096 -31.21% -31.54% -30.96%
layer_norm/896 -8.73% -9.06% -8.32%
memory_size/search_1000_float32 -16.12% -16.25% -16.00%
memory_size/search_1000_int8 -19.71% -19.99% -19.48%
rms_norm/4096 -9.64% -9.79% -9.51%
rms_norm/896 -18.90% -19.53% -18.41%
silu_inplace/4096 -23.25% -23.95% -22.62%
silu_inplace/896 -22.66% -22.90% -22.45%
simd_batch_cosine/scalar_loop/10 -21.35% -21.45% -21.25%
simd_batch_cosine/scalar_loop/100 -21.29% -21.33% -21.26%
simd_batch_cosine/scalar_loop/1000 -21.45% -21.62% -21.31%
simd_batch_cosine/simd_batch/10 -21.20% -21.40% -21.01%
simd_batch_cosine/simd_batch/100 -17.04% -17.41% -16.75%
simd_batch_cosine/simd_batch/1000 -16.39% -16.53% -16.25%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_1000c -25.50% -25.69% -25.34%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_16c -25.72% -25.94% -25.50%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_256c -25.52% -27.11% -24.22%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_4c -25.26% -25.36% -25.13%
simd_batch_cosine_non_normalized_query/pair_loop/1024d_64c -26.49% -27.21% -25.95%
simd_batch_cosine_non_normalized_query/pair_loop/384d_1000c -16.73% -17.04% -16.33%
simd_batch_cosine_non_normalized_query/pair_loop/384d_16c -18.66% -18.85% -18.51%
simd_batch_cosine_non_normalized_query/pair_loop/384d_256c -17.47% -17.61% -17.37%
simd_batch_cosine_non_normalized_query/pair_loop/384d_4c -21.51% -21.73% -21.28%
simd_batch_cosine_non_normalized_query/pair_loop/384d_64c -17.44% -17.54% -17.33%
simd_batch_cosine_non_normalized_query/pair_loop/768d_1000c -25.87% -26.12% -25.59%
simd_batch_cosine_non_normalized_query/pair_loop/768d_16c -26.21% -26.38% -26.04%
simd_batch_cosine_non_normalized_query/pair_loop/768d_256c -26.22% -26.76% -25.82%
simd_batch_cosine_non_normalized_query/pair_loop/768d_4c -26.12% -26.23% -26.00%
simd_batch_cosine_non_normalized_query/pair_loop/768d_64c -25.75% -25.97% -25.59%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_1000c -26.16% -26.88% -25.28%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_16c -25.89% -26.04% -25.78%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_256c -24.24% -24.59% -23.84%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_4c -26.61% -26.97% -26.29%
simd_batch_cosine_non_normalized_query/simd_batch/1024d_64c -27.20% -27.85% -26.69%
simd_batch_cosine_non_normalized_query/simd_batch/384d_1000c -22.00% -22.21% -21.82%
simd_batch_cosine_non_normalized_query/simd_batch/384d_16c -22.86% -23.02% -22.71%
simd_batch_cosine_non_normalized_query/simd_batch/384d_256c -22.20% -22.46% -21.98%
simd_batch_cosine_non_normalized_query/simd_batch/384d_4c -24.38% -24.74% -24.07%
simd_batch_cosine_non_normalized_query/simd_batch/384d_64c -21.99% -22.09% -21.90%
simd_batch_cosine_non_normalized_query/simd_batch/768d_1000c -26.87% -27.03% -26.72%
simd_batch_cosine_non_normalized_query/simd_batch/768d_16c -26.85% -26.97% -26.71%
simd_batch_cosine_non_normalized_query/simd_batch/768d_256c -26.28% -26.48% -26.09%
simd_batch_cosine_non_normalized_query/simd_batch/768d_4c -27.18% -27.38% -27.00%
simd_batch_cosine_non_normalized_query/simd_batch/768d_64c -27.39% -27.78% -27.08%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_1000c -21.68% -23.10% -19.84%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_16c -22.86% -23.25% -22.37%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_256c -22.13% -22.32% -21.90%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_4c -22.64% -22.77% -22.48%
simd_batch_cosine_normalized_query/pair_loop_cosine/1024d_64c -22.68% -22.85% -22.52%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_1000c -12.17% -12.32% -12.05%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_16c -15.02% -15.13% -14.89%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_256c -12.32% -12.58% -12.12%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_4c -14.54% -14.81% -14.32%
simd_batch_cosine_normalized_query/pair_loop_cosine/384d_64c -11.87% -12.11% -11.69%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_1000c -25.55% -25.71% -25.37%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_16c -25.31% -25.49% -25.10%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_256c -25.68% -25.87% -25.55%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_4c -23.94% -24.23% -23.71%
simd_batch_cosine_normalized_query/pair_loop_cosine/768d_64c -25.04% -26.33% -23.29%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_1000c -20.50% -20.60% -20.38%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_16c -20.81% -20.99% -20.64%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_256c -18.90% -19.19% -18.64%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_4c -21.57% -21.76% -21.40%
simd_batch_cosine_normalized_query/pair_loop_dot/1024d_64c -18.17% -18.26% -18.06%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_1000c -11.29% -11.66% -10.99%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_16c -5.90% -6.07% -5.66%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_256c -8.77% -8.91% -8.61%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_4c -8.94% -9.49% -8.62%
simd_batch_cosine_normalized_query/pair_loop_dot/384d_64c -6.87% -7.06% -6.70%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_1000c -28.03% -28.24% -27.80%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_16c -28.80% -28.87% -28.74%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_256c -28.35% -28.41% -28.29%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_4c -25.41% -25.57% -25.28%
simd_batch_cosine_normalized_query/pair_loop_dot/768d_64c -28.86% -29.06% -28.72%
simd_batch_cosine_normalized_query/simd_batch/1024d_1000c -24.31% -24.45% -24.13%
simd_batch_cosine_normalized_query/simd_batch/1024d_16c -23.54% -23.71% -23.42%
simd_batch_cosine_normalized_query/simd_batch/1024d_256c -23.09% -23.31% -22.87%
simd_batch_cosine_normalized_query/simd_batch/1024d_4c -22.92% -23.07% -22.74%
simd_batch_cosine_normalized_query/simd_batch/1024d_64c -23.35% -23.51% -23.25%
simd_batch_cosine_normalized_query/simd_batch/384d_1000c -17.44% -17.84% -17.12%
simd_batch_cosine_normalized_query/simd_batch/384d_16c -20.01% -20.24% -19.76%
simd_batch_cosine_normalized_query/simd_batch/384d_256c -17.80% -17.92% -17.68%
simd_batch_cosine_normalized_query/simd_batch/384d_4c -18.81% -18.97% -18.60%
simd_batch_cosine_normalized_query/simd_batch/384d_64c -16.74% -16.97% -16.45%
simd_batch_cosine_normalized_query/simd_batch/768d_1000c -26.89% -26.97% -26.81%
simd_batch_cosine_normalized_query/simd_batch/768d_16c -26.60% -26.74% -26.43%
simd_batch_cosine_normalized_query/simd_batch/768d_256c -26.59% -26.78% -26.39%
simd_batch_cosine_normalized_query/simd_batch/768d_4c -24.27% -24.40% -24.16%
simd_batch_cosine_normalized_query/simd_batch/768d_64c -26.91% -27.08% -26.78%
simd_batch_dot_product/scalar_loop/10 -21.22% -21.39% -21.07%
simd_batch_dot_product/scalar_loop/100 -21.06% -21.22% -20.83%
simd_batch_dot_product/scalar_loop/1000 -21.02% -21.14% -20.92%
simd_batch_dot_product/simd_batch/10 -19.65% -19.87% -19.41%
simd_batch_dot_product/simd_batch/100 -20.75% -20.99% -20.57%
simd_batch_dot_product/simd_batch/1000 -16.93% -17.21% -16.68%
simd_cosine_similarity/scalar/1024 -22.25% -22.44% -22.13%
simd_cosine_similarity/scalar/1536 -22.46% -22.68% -22.29%
simd_cosine_similarity/scalar/384 -21.63% -21.69% -21.59%
simd_cosine_similarity/scalar/768 -22.14% -22.28% -22.04%
simd_cosine_similarity/simd/1024 -23.42% -23.56% -23.29%
simd_cosine_similarity/simd/1536 -20.74% -20.82% -20.68%
simd_cosine_similarity/simd/384 -24.50% -24.66% -24.35%
simd_cosine_similarity/simd/768 -13.01% -13.20% -12.85%
simd_dot_product/scalar/1024 -22.34% -22.37% -22.32%
simd_dot_product/scalar/1536 -22.42% -22.45% -22.38%
simd_dot_product/scalar/384 -22.22% -22.32% -22.10%
simd_dot_product/scalar/768 -22.44% -22.47% -22.39%
simd_dot_product/simd/1024 -22.58% -22.72% -22.46%
simd_dot_product/simd/1536 -21.94% -22.15% -21.78%
simd_dot_product/simd/384 -8.88% -9.10% -8.64%
simd_dot_product/simd/768 -4.45% -4.71% -4.23%
simd_euclidean_distance/scalar/1024 -22.51% -22.83% -22.26%
simd_euclidean_distance/scalar/1536 -22.19% -22.22% -22.16%
simd_euclidean_distance/scalar/384 -21.57% -21.62% -21.51%
simd_euclidean_distance/scalar/768 -22.05% -22.21% -21.89%
simd_euclidean_distance/simd/1024 -19.40% -19.51% -19.28%
simd_euclidean_distance/simd/1536 +5.49% +5.27% +5.65%
simd_euclidean_distance/simd/384 -10.92% -11.10% -10.71%
simd_euclidean_distance/simd/768 -5.60% -5.95% -5.10%
simd_normalize/scalar/1024 -22.38% -22.57% -22.22%
simd_normalize/scalar/1536 -21.45% -21.98% -20.87%
simd_normalize/scalar/384 -21.83% -22.10% -21.54%
simd_normalize/scalar/768 -22.28% -22.60% -22.02%
simd_normalize/simd/1024 -10.13% -11.95% -8.26%
simd_normalize/simd/1536 -8.17% -10.27% -6.09%
simd_normalize/simd/384 -7.07% -9.78% -4.06%
simd_normalize/simd/768 -5.05% -7.90% -2.63%
simd_normalized_cosine_fast_path/cosine_full/1024 -21.82% -22.07% -21.59%
simd_normalized_cosine_fast_path/cosine_full/384 -17.19% -17.58% -16.87%
simd_normalized_cosine_fast_path/cosine_full/768 -21.35% -21.97% -20.84%
simd_normalized_cosine_fast_path/dot_product/1024 -24.08% -24.98% -23.33%
simd_normalized_cosine_fast_path/dot_product/384 -8.18% -8.43% -7.94%
simd_normalized_cosine_fast_path/dot_product/768 -24.26% -24.49% -24.07%
simd_prepared_query_normalized_cosine/dot_product_loop/1024 -8.26% -8.85% -7.79%
simd_prepared_query_normalized_cosine/dot_product_loop/384 -17.08% -17.62% -16.64%
simd_prepared_query_normalized_cosine/dot_product_loop/768 -16.96% -17.17% -16.83%
simd_prepared_query_normalized_cosine/prepared_full_cosine/1024 -25.70% -25.97% -25.50%
simd_prepared_query_normalized_cosine/prepared_full_cosine/384 -16.15% -16.32% -16.01%
simd_prepared_query_normalized_cosine/prepared_full_cosine/768 -24.45% -24.64% -24.32%
simd_prepared_query_normalized_cosine/prepared_meta_unit/1024 -26.03% -26.22% -25.85%
simd_prepared_query_normalized_cosine/prepared_meta_unit/384 -19.19% -19.29% -19.09%
simd_prepared_query_normalized_cosine/prepared_meta_unit/768 -19.61% -19.81% -19.46%
simd_query_batch_dot_product/pair_loop/128d_16c -21.42% -21.60% -21.22%
simd_query_batch_dot_product/pair_loop/128d_256c -21.91% -22.49% -21.47%
simd_query_batch_dot_product/pair_loop/128d_4c -18.68% -18.95% -18.43%
simd_query_batch_dot_product/pair_loop/128d_64c -22.89% -23.12% -22.65%
simd_query_batch_dot_product/pair_loop/384d_16c -17.19% -17.35% -17.04%
simd_query_batch_dot_product/pair_loop/384d_256c -17.28% -17.40% -17.20%
simd_query_batch_dot_product/pair_loop/384d_4c -16.74% -16.97% -16.54%
simd_query_batch_dot_product/pair_loop/384d_64c -17.38% -17.56% -17.22%
simd_query_batch_dot_product/pair_loop/768d_16c -28.89% -29.07% -28.75%
simd_query_batch_dot_product/pair_loop/768d_256c -28.26% -28.41% -28.15%
simd_query_batch_dot_product/pair_loop/768d_4c -26.63% -26.80% -26.48%
simd_query_batch_dot_product/pair_loop/768d_64c -29.52% -29.59% -29.46%
simd_query_batch_dot_product/simd_batch/128d_16c -20.53% -20.89% -20.19%
simd_query_batch_dot_product/simd_batch/128d_256c -22.31% -22.53% -22.17%
simd_query_batch_dot_product/simd_batch/128d_4c -23.96% -24.44% -23.50%
simd_query_batch_dot_product/simd_batch/128d_64c -17.97% -18.07% -17.87%
simd_query_batch_dot_product/simd_batch/384d_16c -18.12% -18.20% -18.05%
simd_query_batch_dot_product/simd_batch/384d_256c -21.88% -22.09% -21.71%
simd_query_batch_dot_product/simd_batch/384d_4c -17.43% -17.88% -17.04%
simd_query_batch_dot_product/simd_batch/384d_64c -20.75% -21.26% -20.35%
simd_query_batch_dot_product/simd_batch/768d_16c -22.66% -22.85% -22.48%
simd_query_batch_dot_product/simd_batch/768d_256c -21.81% -22.24% -21.43%
simd_query_batch_dot_product/simd_batch/768d_4c -18.41% -18.51% -18.28%
simd_query_batch_dot_product/simd_batch/768d_64c -23.36% -23.45% -23.25%
simd_squared_euclidean_fast_path/euclidean_full/1024 -23.56% -23.73% -23.44%
simd_squared_euclidean_fast_path/euclidean_full/384 -20.96% -21.11% -20.81%
simd_squared_euclidean_fast_path/euclidean_full/768 -22.37% -22.53% -22.20%
simd_squared_euclidean_fast_path/squared_euclidean/1024 -22.97% -23.10% -22.83%
simd_squared_euclidean_fast_path/squared_euclidean/384 -22.61% -23.66% -21.72%
simd_squared_euclidean_fast_path/squared_euclidean/768 -23.33% -23.53% -23.15%
simd_throughput_384/cosine_similarity -17.79% -18.00% -17.60%
simd_throughput_384/dot_product -11.12% -11.38% -10.85%
simd_throughput_384/euclidean_distance -11.28% -11.49% -11.08%
simd_throughput_384/normalize -11.16% -11.51% -10.82%
softmax_attention/128 -19.75% -19.87% -19.63%
softmax_attention/512 -21.02% -21.13% -20.87%
tier_prepared_batch_sizes/int4_batch_prepared/10 -21.92% -22.36% -21.57%
tier_prepared_batch_sizes/int4_batch_prepared/100 -21.69% -22.02% -21.22%
tier_prepared_batch_sizes/int4_batch_prepared/1000 -22.12% -22.47% -21.87%
tier_prepared_batch_sizes/int4_query_per_call/10 -21.90% -22.07% -21.67%
tier_prepared_batch_sizes/int4_query_per_call/100 -22.01% -22.25% -21.82%
tier_prepared_batch_sizes/int4_query_per_call/1000 -21.93% -22.07% -21.77%
tier_prepared_batch_sizes/int8_batch_prepared/10 -19.53% -19.79% -19.22%
tier_prepared_batch_sizes/int8_batch_prepared/100 -19.08% -19.25% -18.83%
tier_prepared_batch_sizes/int8_batch_prepared/1000 -19.55% -19.92% -19.19%
tier_prepared_batch_sizes/int8_query_per_call/10 -22.73% -23.20% -22.28%
tier_prepared_batch_sizes/int8_query_per_call/100 -22.30% -22.55% -22.09%
tier_prepared_batch_sizes/int8_query_per_call/1000 -22.37% -22.94% -21.90%
tier_prepared_query/binary_query_once_1000 -22.65% -22.89% -22.44%
tier_prepared_query/binary_query_per_call_1000 -22.97% -23.21% -22.71%
tier_prepared_query/int4_query_once_1000 -22.83% -23.54% -22.32%
tier_prepared_query/int4_query_per_call_1000 -21.90% -22.02% -21.76%
tier_prepared_query/int8_query_once_1000 -20.26% -20.59% -19.84%
tier_prepared_query/int8_query_per_call_1000 -22.25% -22.45% -22.02%

Rule: CI-lower of change ≤3.0% passes silently; (3.0%, 7.0%] warns; >7.0% fails. Override via PR label bench-allow-regression.

Gate is in advisory mode (Rollout step 3, ADR-058 §Rollout). Failures do not block merge for the first 7 days.

ohdearquant and others added 2 commits May 30, 2026 23:33
Criterion panics with --baseline when a bench group has no prior data
(e.g., newly added groups). --save-baseline saves new data AND compares
against existing data if present, without panicking on missing groups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Criterion --quick uses fewer samples — enough to detect direction and
magnitude for a PR gate, not tight CIs. Full runs are for local
bench-compare before submitting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ohdearquant ohdearquant merged commit 8476ba4 into main May 31, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci(hook): tighten pre-commit clippy to --all-targets

1 participant