Skip to content

perf: fuse fvar substitution into instantiateMVars#12233

Draft
nomeata wants to merge 13 commits intomasterfrom
joachim/instantiateMVarsNoUpdate
Draft

perf: fuse fvar substitution into instantiateMVars#12233
nomeata wants to merge 13 commits intomasterfrom
joachim/instantiateMVarsNoUpdate

Conversation

@nomeata
Copy link
Collaborator

@nomeata nomeata commented Jan 29, 2026

This PR adds instantiateAllMVars and instantiateAllMVarsSharing, two-pass variants of instantiateMVars for call sites where all metavariables are known to be assigned. They fail fast (returning none) on unassigned metavariables instead of leaving them in place or calling the expensive elimMVarDeps machinery.

Two-pass architecture

  • Pass 1 (instantiate_direct_fn): Standard instantiateMVars-like traversal that resolves direct mvar assignments with write-back and a single persistent cache. For delayed assignments, it pre-normalizes the pending value (resolving its direct chain) but leaves the delayed mvar application in the expression.

  • Pass 2 (instantiate_delayed_fn): Fused traversal that resolves delayed assignments by carrying an fvar substitution map. When encountering ?m [x₁,…,xₖ] := ?pending, it extends the substitution with [xᵢ ↦ argᵢ] and continues into the pending value. Since pass 1 has pre-normalized all direct chains, each pending value is compact and visited once.

Sharing preservation (instantiateAllMVarsSharing)

The fused fvar-substitution approach can lose sharing across delayed-mvar boundaries: when a shared subexpression (like @Eq Nat s s) appears both inside a delayed mvar's value and in a type argument, the cache scope swap at visit_delayed entry causes each occurrence to independently produce the same result, yielding distinct objects.

instantiateAllMVarsSharing solves this with a scope-tracked cache stack:

  • Each visit_delayed call pushes a new cache scope, and records the scope level in each fvar substitution entry
  • m_result_scope tracks the maximum fvar-scope used by each subtree (via save/reset/restore in visit())
  • Results are cached at the outermost valid scope level (determined by m_result_scope), so entries that only depend on outer-scope fvars persist when inner scopes are popped
  • Lookup scans the stack from outermost to innermost (typically 1-3 levels deep)

This achieves O(n²) output sharing matching instantiateMVars, vs O(n³) without sharing preservation.

Benchmark results (pathological nested-delayed-mvar case)

Method n=50 n=100 n=200
allMVarsSharing 0ms / 2,030 objs 0ms / 6,555 objs 1ms / 23,105 objs
instantiateMVars 1ms / 1,981 objs 5ms / 6,456 objs 37ms / 22,906 objs
allMVars (no sharing) 1ms / 50,932 objs 12ms / 369,357 objs 92ms / 2,808,707 objs
noUpdateLean 3ms / 50,932 objs 33ms / 369,357 objs 257ms / 2,808,707 objs
noUpdate (C++) 25ms / 50,932 objs 420ms / 369,357 objs 7,336ms / 2,808,707 objs

Files

  • src/kernel/instantiate_mvars_all.cpp — C++ implementation of both passes
  • src/Lean/Meta/InstMVarsAll.lean — Lean wrappers for instantiateAllMVars and instantiateAllMVarsSharing
  • tests/lean/run/instantiateAllMVarsSharing.lean — Minimal test verifying sharing behavior of both variants

🤖 Generated with Claude Code

@nomeata nomeata added the changelog-language Language features and metaprograms label Jan 29, 2026
@nomeata
Copy link
Collaborator Author

nomeata commented Jan 29, 2026

!radar

@leanprover-radar
Copy link

leanprover-radar commented Jan 29, 2026

Benchmark results for c83f527 against 9e18eea are in! @nomeata

  • 🟥 build//instructions: +4.7G (+0.0%)

Large changes (1✅, 2🟥)

  • 🟥 big_struct//instructions: +3.0G (+15.0%)
  • big_struct_dep//instructions: -3.8G (-18.4%)
  • 🟥 big_struct_dep1//instructions: +9.3G (+22.7%)

Small changes (7🟥)

  • 🟥 build/module/Init.Grind.Config//instructions: +53.5M (+2.8%) (reduced significance based on absolute threshold)
  • 🟥 build/module/Init.MetaTypes//instructions: +35.6M (+1.6%)
  • 🟥 build/module/Lake.Config.Package//instructions: +12.8M (+0.5%)
  • 🟥 build/module/Lake.Config.PackageConfig//instructions: +36.0M (+0.6%)
  • 🟥 build/module/Lean.Meta.Tactic.Grind.Arith.CommRing.Types//instructions: +25.2M (+0.9%)
  • 🟥 build/module/Lean.Meta.Tactic.Grind.Arith.Cutsat.ToIntInfo//instructions: +16.3M (+0.9%)
  • 🟥 build/module/Lean.Meta.Tactic.Grind.Arith.Linear.Types//instructions: +80.9M (+1.5%)

@github-actions github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Jan 29, 2026
@leanprover-bot
Copy link
Collaborator

leanprover-bot commented Jan 29, 2026

Reference manual CI status:

  • ❗ Reference manual CI can not be attempted yet, as the nightly-testing-2026-01-28 tag does not exist there yet. We will retry when you push more commits. If you rebase your branch onto nightly-with-manual, reference manual CI should run now. You can force reference manual CI using the force-manual-ci label. (2026-01-29 17:27:26)
  • ❗ Reference manual CI can not be attempted yet, as the nightly-testing-2026-02-25 tag does not exist there yet. We will retry when you push more commits. If you rebase your branch onto nightly-with-manual, reference manual CI should run now. You can force reference manual CI using the force-manual-ci label. (2026-02-28 00:29:59)

ghost pushed a commit to leanprover-community/batteries that referenced this pull request Jan 29, 2026
@github-actions github-actions bot added the mathlib4-nightly-available A branch for this PR exists at leanprover-community/mathlib4-nightly-testing:lean-pr-testing-NNNN label Jan 29, 2026
ghost pushed a commit to leanprover-community/mathlib4-nightly-testing that referenced this pull request Jan 29, 2026
@ghost ghost added the builds-mathlib CI has verified that Mathlib builds against this PR label Jan 29, 2026
@ghost
Copy link

ghost commented Jan 29, 2026

Mathlib CI status (docs):

nomeata and others added 2 commits February 26, 2026 23:23
…date

This adds an experimental `instantiateMVarsNoUpdate2` variant (inline in the
benchmark) that keeps its expression cache across binder depths using
bvar-blind keying, rather than clearing the cache at each depth.

Bench2 shows it avoids the O(depth * bodySize) scaling of the old approach
when the same closed sub-expression appears at many depths (crossover ~depth=50).
However, the ~6ms constant per-node overhead makes it 3-5x slower on the
bench1 workload (the realistic delayed-assignment case), where both variants
are already sub-millisecond. Not worth adopting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This PR adds a C++ implementation of `instantiateMVarsNoUpdate`, following the
pattern of the existing `instantiate_mvars.cpp`. Key features:

- FVar substitution via hash map instead of `replace_fvars`, avoiding a
  separate traversal pass for delayed assignments
- Depth tracking with lazy lifting of substitution values
- Two-level cache: a depth-dependent cache (cleared under binders) and a
  global cache for fvar-free expressions that persists across binder depths
- When the fvar substitution is empty, updates mvar assignments with their
  normalized values so subsequent `instantiateMVars` calls are free

Benchmark (delayed-assignment workload):
- C++ NoUpdate is ~2x faster than the Lean NoUpdate implementation
- C++ NoUpdate is 10-26x faster than standard instantiateMVars
- The global cache eliminates O(depth) scaling for fvar-free expressions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nomeata nomeata force-pushed the joachim/instantiateMVarsNoUpdate branch from 267f6b4 to 860b8fd Compare February 27, 2026 10:50
This PR makes instantiateMVarsNoUpdate the default implementation behind
lean_instantiate_expr_mvars. The NoUpdate variant carries a free variable
substitution during traversal instead of doing repeated replace_fvars,
avoiding quadratic behavior in terms with many delayed-assigned metavariables.

Key fixes for drop-in compatibility:
- Use name_hash_map for fvar substitution (structural name equality, not
  pointer comparison)
- Always call apply_beta with preserve_data=false for regular mvar
  assignments to strip mdata wrappers like _recApp
- Clear depth-dependent cache when extending fvar substitution in
  visit_delayed
- Handle unassigned mvars with active fvar substitution by creating
  delayed assignments via elimMVarDeps
- Match standard instantiateMVars guards for delayed assignment resolution

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nomeata
Copy link
Collaborator Author

nomeata commented Feb 27, 2026

!radar

@leanprover-radar
Copy link

leanprover-radar commented Feb 27, 2026

Benchmark results for c5d6267 against 9e18eea are in! @nomeata

Warning

These warnings may indicate that the benchmark results are not directly comparable, for example due to changes in the runner configuration or hardware.

  • Bench repo commit hashes for run build differ between commits.
  • Bench repo commit hashes for run other differ between commits.
  • 🟥 build//instructions: +87.5G (+0.67%)

Large changes (7🟥)

  • 🟥 build/module/Std.Data.Iterators.Lemmas.Combinators.Monadic.FilterMap//instructions: +5.1G (+88.93%)
  • 🟥 build/module/Std.Tactic.BVDecide.Bitblast.BVExpr.Circuit.Lemmas.Expr//instructions: +8.6G (+6.71%)
  • 🟥 elab/big_struct//instructions: +10.7G (+53.17%)
  • 🟥 elab/big_struct_dep//instructions: +4.1G (+19.94%)
  • 🟥 elab/big_struct_dep1//instructions: +20.1G (+48.89%)
  • 🟥 elab/grind_bitvec2//instructions: +3.4G (+1.54%)
  • 🟥 elab/omega_stress//instructions: +259.9M (+5.04%)

Medium changes (19🟥)

  • 🟥 build/module/Init.Data.Iterators.Lemmas.Combinators.Monadic.FilterMap//instructions: +4.6G (+9.62%) (reduced significance based on absolute threshold)
  • 🟥 build/module/Init.Data.Range.Polymorphic.RangeIterator//instructions: +3.2G (+13.85%) (reduced significance based on absolute threshold)
  • 🟥 build/module/Init.Data.String.Decode//instructions: +1.1G (+3.20%) (reduced significance based on absolute threshold)
  • 🟥 build/module/Std.Data.DTreeMap.Internal.Balancing//instructions: +2.1G (+2.61%)
  • 🟥 build/module/Std.Data.DTreeMap.Internal.Model//instructions: +1.6G (+2.24%)
  • 🟥 build/module/Std.Data.DTreeMap.Internal.Operations//instructions: +1.5G (+2.96%) (reduced significance based on absolute threshold)
  • 🟥 build/module/Std.Data.DTreeMap.Internal.WF.Lemmas//instructions: +1.4G (+2.93%)
  • 🟥 build/module/Std.Data.Internal.List.Associative//instructions: +1.1G (+1.21%)
  • 🟥 build/module/Std.Data.Iterators.Lemmas.Combinators.Zip//instructions: +4.2G (+38.48%) (reduced significance based on absolute threshold)
  • 🟥 build/module/Std.Tactic.BVDecide.Bitblast.BVExpr.Circuit.Impl.Expr//instructions: +1.8G (+2.43%) (reduced significance based on absolute threshold)
  • 🟥 build/module/Std.Tactic.BVDecide.LRAT.Internal.Formula.RupAddResult//instructions: +2.1G (+3.77%) (reduced significance based on absolute threshold)
  • 🟥 build/module/Std.Tactic.BVDecide.LRAT.Internal.Formula.RupAddSound//instructions: +1.3G (+6.14%) (reduced significance based on absolute threshold)
  • 🟥 elab/big_match_partial//instructions: +137.2M (+0.76%)
  • 🟥 elab/bv_decide_rewriter//instructions: +91.0M (+0.77%)
  • 🟥 elab/grind_list2//instructions: +563.6M (+0.92%)
  • 🟥 elab/mut_rec_wf//instructions: +418.6M (+1.36%)
  • 🟥 misc/import Init.Data.BitVec.Lemmas//instructions: +819.4M (+0.56%)
  • 🟥 misc/import Init.Data.List.Sublist//instructions: +191.2M (+1.66%)
  • 🟥 misc/import Std.Data.Internal.List.Associative//instructions: +1.1G (+1.38%)

Small changes (1✅, 212🟥)

Too many entries to display here. View the full report on radar instead.

@nomeata
Copy link
Collaborator Author

nomeata commented Feb 27, 2026

!radar

@leanprover-radar
Copy link

leanprover-radar commented Feb 27, 2026

Benchmark results for ee068d5 against 87ec768 are in! @nomeata

  • 🟥 build exited with code -1
  • 🟥 other exited with code -1

No significant changes detected.

@nomeata nomeata force-pushed the joachim/instantiateMVarsNoUpdate branch from ee068d5 to ae9230f Compare February 27, 2026 23:17
@nomeata
Copy link
Collaborator Author

nomeata commented Feb 27, 2026

!radar

@leanprover-radar
Copy link

leanprover-radar commented Feb 27, 2026

Benchmark results for ae9230f against 87ec768 are in! @nomeata

  • 🟥 build//instructions: +95.9G (+0.76%)

Large changes (8🟥)

  • 🟥 build/module/Init.Data.Iterators.Lemmas.Combinators.Monadic.FilterMap//instructions: +5.0G (+10.94%)
  • 🟥 build/module/Init.Data.String.Lemmas.Pattern.String.ForwardSearcher//instructions: +7.5G (+38.64%)
  • 🟥 build/module/Std.Data.Iterators.Lemmas.Combinators.Monadic.FilterMap//instructions: +5.2G (+95.78%)
  • 🟥 build/module/Std.Tactic.BVDecide.Bitblast.BVExpr.Circuit.Lemmas.Expr//instructions: +8.5G (+7.46%)
  • 🟥 elab/big_struct_dep//instructions: +4.1G (+19.96%)
  • 🟥 elab/big_struct_dep1//instructions: +872.4M (+13.71%)
  • 🟥 elab/grind_bitvec2//instructions: +3.5G (+1.72%)
  • 🟥 elab/omega_stress//instructions: +258.0M (+5.69%)

Medium changes (21🟥)

Too many entries to display here. View the full report on radar instead.

Small changes (1✅, 207🟥)

Too many entries to display here. View the full report on radar instead.

@nomeata nomeata changed the title perf: a non-updating single-subst instantiateMVar perf: fuse fvar substitution into instantiateMVars Feb 27, 2026
mathlib-nightly-testing bot pushed a commit to leanprover-community/batteries that referenced this pull request Feb 28, 2026
mathlib-nightly-testing bot pushed a commit to leanprover-community/mathlib4-nightly-testing that referenced this pull request Feb 28, 2026
@mathlib-lean-pr-testing mathlib-lean-pr-testing bot removed the builds-mathlib CI has verified that Mathlib builds against this PR label Feb 28, 2026
@mathlib-lean-pr-testing
Copy link

Mathlib CI status (docs):

@mathlib-lean-pr-testing mathlib-lean-pr-testing bot added the breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan label Feb 28, 2026
…gned

This PR adds `instantiateAllMVars`, a variant of `instantiateMVars` that
fails fast when encountering an unassigned mvar under an active fvar
substitution, instead of calling the expensive `elimMVarDeps` machinery.
Use it at call sites where all mvars are known to be assigned. Reverts
the drop-in replacement of `instantiateMVars` with `instantiateMVarsNoUpdate`
and instead targets `instantiateMVarsAtPreDecls` as the primary call site.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nomeata
Copy link
Collaborator Author

nomeata commented Feb 28, 2026

!bench

@nomeata
Copy link
Collaborator Author

nomeata commented Feb 28, 2026

!radar

@leanprover-radar
Copy link

leanprover-radar commented Feb 28, 2026

Benchmark results for 2c6ee17 against 87ec768 are in! @nomeata

  • 🟥 build//instructions: +5.3G (+0.04%)

Small changes (1🟥)

  • 🟥 build/module/Lean.MetavarContext//instructions: +76.4M (+0.70%)

…AllMVars

Rewrite instantiateAllMVars as a two-pass algorithm:
- Pass 1 resolves direct mvar assignments with write-back caching
- Pass 2 resolves delayed assignments with fused fvar substitution

Add instantiateAllMVarsSharing variant that uses a stack of (ptr, depth)
caches with scope tracking to preserve sharing across visit_delayed
boundaries. When lookup_fvar substitutes a fvar, it records which scope
added it. Results are cached at the outermost valid scope level, so they
persist when inner scopes are popped. This achieves O(n²) output sharing
(matching standard instantiateMVars) vs O(n³) without sharing preservation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mathlib-nightly-testing bot pushed a commit to leanprover-community/batteries that referenced this pull request Mar 2, 2026
mathlib-nightly-testing bot pushed a commit to leanprover-community/mathlib4-nightly-testing that referenced this pull request Mar 2, 2026
…riginal

Make the two-pass instantiateAllMVars/instantiateAllMVarsSharing handle
unassigned mvars (via abstract_mvar_fvars/elimMVarDeps) instead of failing.
This makes them usable as a drop-in replacement at call sites where mvars
may not all be assigned, though the unassigned-mvar behavior differs from
the standard implementation (lambda-wrapping vs beta-reduction).

Also expose instantiateMVarsOriginal for benchmarking independently of
which implementation is the default.

Note: NOT yet wired as default instantiateMVars — the behavioral difference
for unassigned mvars causes test failures (e.g. meta7.lean). The redirect
is reverted; these remain as separate API functions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mathlib-lean-pr-testing mathlib-lean-pr-testing bot added builds-mathlib CI has verified that Mathlib builds against this PR and removed breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan labels Mar 2, 2026
@mathlib-lean-pr-testing
Copy link

Mathlib CI status (docs):

mathlib-nightly-testing bot pushed a commit to leanprover-community/batteries that referenced this pull request Mar 2, 2026
mathlib-nightly-testing bot pushed a commit to leanprover-community/mathlib4-nightly-testing that referenced this pull request Mar 2, 2026
@mathlib-lean-pr-testing
Copy link

Mathlib CI status (docs):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

builds-mathlib CI has verified that Mathlib builds against this PR changelog-language Language features and metaprograms mathlib4-nightly-available A branch for this PR exists at leanprover-community/mathlib4-nightly-testing:lean-pr-testing-NNNN toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants