perf: fuse fvar substitution into instantiateMVars#12233
perf: fuse fvar substitution into instantiateMVars#12233
Conversation
|
!radar |
|
Benchmark results for c83f527 against 9e18eea are in! @nomeata
Large changes (1✅, 2🟥)
Small changes (7🟥)
|
|
Reference manual CI status:
|
|
Mathlib CI status (docs):
|
…date This adds an experimental `instantiateMVarsNoUpdate2` variant (inline in the benchmark) that keeps its expression cache across binder depths using bvar-blind keying, rather than clearing the cache at each depth. Bench2 shows it avoids the O(depth * bodySize) scaling of the old approach when the same closed sub-expression appears at many depths (crossover ~depth=50). However, the ~6ms constant per-node overhead makes it 3-5x slower on the bench1 workload (the realistic delayed-assignment case), where both variants are already sub-millisecond. Not worth adopting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This PR adds a C++ implementation of `instantiateMVarsNoUpdate`, following the pattern of the existing `instantiate_mvars.cpp`. Key features: - FVar substitution via hash map instead of `replace_fvars`, avoiding a separate traversal pass for delayed assignments - Depth tracking with lazy lifting of substitution values - Two-level cache: a depth-dependent cache (cleared under binders) and a global cache for fvar-free expressions that persists across binder depths - When the fvar substitution is empty, updates mvar assignments with their normalized values so subsequent `instantiateMVars` calls are free Benchmark (delayed-assignment workload): - C++ NoUpdate is ~2x faster than the Lean NoUpdate implementation - C++ NoUpdate is 10-26x faster than standard instantiateMVars - The global cache eliminates O(depth) scaling for fvar-free expressions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
267f6b4 to
860b8fd
Compare
This PR makes instantiateMVarsNoUpdate the default implementation behind lean_instantiate_expr_mvars. The NoUpdate variant carries a free variable substitution during traversal instead of doing repeated replace_fvars, avoiding quadratic behavior in terms with many delayed-assigned metavariables. Key fixes for drop-in compatibility: - Use name_hash_map for fvar substitution (structural name equality, not pointer comparison) - Always call apply_beta with preserve_data=false for regular mvar assignments to strip mdata wrappers like _recApp - Clear depth-dependent cache when extending fvar substitution in visit_delayed - Handle unassigned mvars with active fvar substitution by creating delayed assignments via elimMVarDeps - Match standard instantiateMVars guards for delayed assignment resolution Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
!radar |
|
Benchmark results for c5d6267 against 9e18eea are in! @nomeata Warning These warnings may indicate that the benchmark results are not directly comparable, for example due to changes in the runner configuration or hardware.
Large changes (7🟥)
Medium changes (19🟥)
Small changes (1✅, 212🟥) Too many entries to display here. View the full report on radar instead. |
|
!radar |
|
Benchmark results for ee068d5 against 87ec768 are in! @nomeata
No significant changes detected. |
…lean4 into joachim/instantiateMVarsNoUpdate
ee068d5 to
ae9230f
Compare
|
!radar |
|
Benchmark results for ae9230f against 87ec768 are in! @nomeata
Large changes (8🟥)
Medium changes (21🟥) Too many entries to display here. View the full report on radar instead. Small changes (1✅, 207🟥) Too many entries to display here. View the full report on radar instead. |
|
Mathlib CI status (docs):
|
…gned This PR adds `instantiateAllMVars`, a variant of `instantiateMVars` that fails fast when encountering an unassigned mvar under an active fvar substitution, instead of calling the expensive `elimMVarDeps` machinery. Use it at call sites where all mvars are known to be assigned. Reverts the drop-in replacement of `instantiateMVars` with `instantiateMVarsNoUpdate` and instead targets `instantiateMVarsAtPreDecls` as the primary call site. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
!bench |
|
!radar |
|
Benchmark results for 2c6ee17 against 87ec768 are in! @nomeata
Small changes (1🟥)
|
…AllMVars Rewrite instantiateAllMVars as a two-pass algorithm: - Pass 1 resolves direct mvar assignments with write-back caching - Pass 2 resolves delayed assignments with fused fvar substitution Add instantiateAllMVarsSharing variant that uses a stack of (ptr, depth) caches with scope tracking to preserve sharing across visit_delayed boundaries. When lookup_fvar substitutes a fvar, it records which scope added it. Results are cached at the outermost valid scope level, so they persist when inner scopes are popped. This achieves O(n²) output sharing (matching standard instantiateMVars) vs O(n³) without sharing preservation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…riginal Make the two-pass instantiateAllMVars/instantiateAllMVarsSharing handle unassigned mvars (via abstract_mvar_fvars/elimMVarDeps) instead of failing. This makes them usable as a drop-in replacement at call sites where mvars may not all be assigned, though the unassigned-mvar behavior differs from the standard implementation (lambda-wrapping vs beta-reduction). Also expose instantiateMVarsOriginal for benchmarking independently of which implementation is the default. Note: NOT yet wired as default instantiateMVars — the behavioral difference for unassigned mvars causes test failures (e.g. meta7.lean). The redirect is reverted; these remain as separate API functions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Mathlib CI status (docs):
|
|
Mathlib CI status (docs):
|
This PR adds
instantiateAllMVarsandinstantiateAllMVarsSharing, two-pass variants ofinstantiateMVarsfor call sites where all metavariables are known to be assigned. They fail fast (returningnone) on unassigned metavariables instead of leaving them in place or calling the expensiveelimMVarDepsmachinery.Two-pass architecture
Pass 1 (
instantiate_direct_fn): StandardinstantiateMVars-like traversal that resolves direct mvar assignments with write-back and a single persistent cache. For delayed assignments, it pre-normalizes the pending value (resolving its direct chain) but leaves the delayed mvar application in the expression.Pass 2 (
instantiate_delayed_fn): Fused traversal that resolves delayed assignments by carrying an fvar substitution map. When encountering?m [x₁,…,xₖ] := ?pending, it extends the substitution with[xᵢ ↦ argᵢ]and continues into the pending value. Since pass 1 has pre-normalized all direct chains, each pending value is compact and visited once.Sharing preservation (
instantiateAllMVarsSharing)The fused fvar-substitution approach can lose sharing across delayed-mvar boundaries: when a shared subexpression (like
@Eq Nat s s) appears both inside a delayed mvar's value and in a type argument, the cache scope swap atvisit_delayedentry causes each occurrence to independently produce the same result, yielding distinct objects.instantiateAllMVarsSharingsolves this with a scope-tracked cache stack:visit_delayedcall pushes a new cache scope, and records the scope level in each fvar substitution entrym_result_scopetracks the maximum fvar-scope used by each subtree (via save/reset/restore invisit())m_result_scope), so entries that only depend on outer-scope fvars persist when inner scopes are poppedThis achieves O(n²) output sharing matching
instantiateMVars, vs O(n³) without sharing preservation.Benchmark results (pathological nested-delayed-mvar case)
Files
src/kernel/instantiate_mvars_all.cpp— C++ implementation of both passessrc/Lean/Meta/InstMVarsAll.lean— Lean wrappers forinstantiateAllMVarsandinstantiateAllMVarsSharingtests/lean/run/instantiateAllMVarsSharing.lean— Minimal test verifying sharing behavior of both variants🤖 Generated with Claude Code