fix: clamp H and W to epsilon floor after each LS-NMF multiplicative update (Rust) by collord · Pull Request #38 · quanted/esat

collord · 2026-02-23T17:38:55Z

Summary

The Rust ls_nmf implementation diverges to a spurious local minimum
(Q ≈ 1e5) and then falsely reports converged=True when the model is
initialised with the default column_mean method. The Python path is
unaffected.

Root cause

SA.initialize() clips any negative draws from the random normal
initialisation to 1e-12, leaving ~50 % of W and H entries at that
floor. With these near-zero entries, the denominator of the
multiplicative update

H ← H * (W.T @ (We·V)) / (W.T @ (We·W@H))

underflows in f32 on the first few steps, producing ratios >> 1 that
push H and W to a large-but-finite local minimum. Because the Q window
eventually stabilises there, the convergence check fires on garbage
factors.

The Python fallback (ls_nmf.py) happens to be numerically stable
enough to self-correct from the same starting point, so the mismatch
only surfaces when comparing the two paths directly.

Minimal reproducer

import numpy as np
from esat.model.sa import SA
from esat.metrics import q_loss

rng = np.random.default_rng(0)
m, n, p = 50, 12, 3
F = rng.uniform(0.1, 1.0, size=(p, n))
G = rng.uniform(0.1, 1.0, size=(m, p))
V = G @ F
U = 0.001 + 0.05 * np.abs(V)

for label, use_rust in [("Rust  ", True), ("Python", False)]:
    sa = SA(V=V, U=U, factors=p, method="ls-nmf", seed=42, parallel=False, verbose=False)
    sa.optimized = use_rust
    sa.train(max_iter=20000, converge_delta=0.01, converge_n=100)
    W, H = sa.W.astype(float), sa.H.astype(float)
    print(f"{label}: Q={q_loss(V,U,W,H):.2f}  converged={sa.converged}")

Before fix:

Rust  : Q=122014.37  converged=True   ← wrong
Python: Q=5.10       converged=True

Fix

Clamp every element of H and W to [1e-12, ∞) after each multiplicative
update step (ls_nmf_update_cpu and ls_nmf_update_gpu), mirroring the
epsilon floor that SA.initialize() already applies to the initial
matrices.

Test

tests/model/test_ls_nmf_rust_stability.py — three assertions:

Test	Pre-fix	Expected post-fix
Rust Q is finite	✅ pass	✅ pass
Rust Q ≈ Python Q under `column_mean` init	❌ fail (ratio ≈ 24 000×)	✅ pass
Rust Q ≈ Python Q under `kmeans` init	❌ fail (ratio ≈ 4 000 000×)	✅ pass

The Rust extension will need to be recompiled (maturin develop or
pip install -e .) before running the test suite.

🤖 Generated with Claude Code

…update The default `column_mean` initialisation clips ~50 % of W and H entries to 1e-12 (negative draws from the normal distribution). These near-zero entries caused the denominator of the multiplicative update H ← H * (W.T @ (We·V)) / (W.T @ (We·W@H)) to underflow in f32, producing a ratio >> 1 on the first several steps. The Rust implementation accumulated this error, converged to a spurious local minimum (Q ≈ 1e5), and reported `converged=True` on garbage factors because the Q-window eventually stabilised at the wrong value. The Python path avoided the trap due to differences in floating-point evaluation order with numpy. Fix: after each H and W multiplicative update, clamp every element to [1e-12, ∞), matching the epsilon floor that `SA.initialize()` already applies to the initial matrices. The same clamp is added to the GPU path for consistency. Adds a regression test (`tests/model/test_ls_nmf_rust_stability.py`) with three assertions: - Rust Q is finite (not NaN/Inf) — was already passing - Rust Q is within 20 % of Python Q under column_mean init — was failing - Rust Q is within 20 % of Python Q under kmeans init — was failing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR fixes a critical numerical instability bug in the Rust LS-NMF implementation where near-zero matrix entries caused multiplicative update ratios to overflow, leading to spurious convergence at a poor local minimum (Q ≈ 1e5 instead of the expected Q ≈ 5).

Changes:

Added epsilon clamping (1e-12 floor) after each H and W multiplicative update in both CPU and GPU Rust paths
Added comprehensive regression test to verify Rust and Python implementations produce comparable results

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`tests/model/test_ls_nmf_rust_stability.py`	New regression test with three test cases verifying numerical stability under column_mean and kmeans initialization
`rust/lib.rs`	Added post-update clamping to prevent NaN/Inf from near-zero denominators in both `ls_nmf_update_cpu` and `ls_nmf_update_gpu`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-23T17:44:20Z

        w = w.mul(&w_delta)?;
+        // Same clamp for W.
+        let eps_w = Tensor::full(1e-12f64, w.shape(), w.device())?;
+        w = w.clamp(&eps_w, &Tensor::full(f64::MAX, w.shape(), w.device())?)?;


Similar to the H update above, the W clamping in the GPU path may not handle NaN values consistently with the CPU implementation. The CPU version explicitly replaces non-finite values with epsilon, while the GPU version relies on Tensor::clamp behavior which may propagate NaN. Consider adding explicit NaN handling for consistency.

Suggested change

w = w.clamp(&eps_w, &Tensor::full(f64::MAX, w.shape(), w.device())?)?;

w = w.clamp(&eps_w, &Tensor::full(f64::MAX, w.shape(), w.device())?)?;

// Explicitly sanitize non-finite values in W to match CPU behavior.

// Any NaN or infinite entries are replaced with epsilon to avoid propagation.

w = w.nan_to_num(1e-12f64)?;

Copilot · 2026-02-23T17:44:21Z

+            let eps_h = Tensor::full(1e-12f64, h.shape(), h.device())?;
+            h = h.clamp(&eps_h, &Tensor::full(f64::MAX, h.shape(), h.device())?)?;


The GPU implementation creates a new epsilon tensor for every H update in every iteration, which is inefficient. This tensor creation should be moved outside the loop to avoid the overhead of allocating the same tensor repeatedly during training (which can run for 20,000 iterations). The same epsilon tensor can be reused across all iterations.

Copilot · 2026-02-23T17:44:21Z

+        let eps_w = Tensor::full(1e-12f64, w.shape(), w.device())?;
+        w = w.clamp(&eps_w, &Tensor::full(f64::MAX, w.shape(), w.device())?)?;


The GPU implementation creates a new epsilon tensor for every W update in every iteration. This should be moved outside the loop and reused, similar to the recommendation for the H epsilon tensor. Creating f64::MAX tensors in every iteration is particularly inefficient.

Copilot · 2026-02-23T17:44:21Z

+            // Clamp: guard against NaN/Inf from near-zero denominators.
+            let eps_h = Tensor::full(1e-12f64, h.shape(), h.device())?;
+            h = h.clamp(&eps_h, &Tensor::full(f64::MAX, h.shape(), h.device())?)?;


The CPU and GPU implementations handle non-finite values differently. The CPU version explicitly checks for non-finite values (NaN or Inf) using !x.is_finite() and replaces them with the epsilon floor. However, the GPU version uses Tensor::clamp which may not handle NaN values the same way - depending on the clamp implementation, NaN values might propagate through rather than being replaced with epsilon. Consider using an explicit NaN check for the GPU path to ensure consistent behavior, or verify that Tensor::clamp handles NaN appropriately.

Address Copilot review feedback on quanted#38: - Move eps_h, eps_w, max_val tensor allocations before the loop to avoid 4 × max_iter GPU allocations per training run. - Remove fill_nan calls — Tensor::fill_nan is not a standard candle-core method. NaN cannot arise at this point anyway: the epsilon floor on both W and H from the previous iteration guarantees strictly positive denominators in the multiplicative update. Add comment explaining this reasoning. - CPU path is unchanged (already uses nalgebra .map() with an explicit is_finite() check per element). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings February 23, 2026 17:38

Copilot started reviewing on behalf of collord February 23, 2026 17:39 View session

Copilot AI reviewed Feb 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: clamp H and W to epsilon floor after each LS-NMF multiplicative update (Rust)#38

fix: clamp H and W to epsilon floor after each LS-NMF multiplicative update (Rust)#38
collord wants to merge 2 commits into
quanted:mainfrom
collord:fix/ls-nmf-epsilon-clamp

collord commented Feb 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 23, 2026

Uh oh!

Copilot AI Feb 23, 2026

Uh oh!

Copilot AI Feb 23, 2026

Uh oh!

Copilot AI Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        w = w.clamp(&eps_w, &Tensor::full(f64::MAX, w.shape(), w.device())?)?;
+        w = w.clamp(&eps_w, &Tensor::full(f64::MAX, w.shape(), w.device())?)?;
+        // Explicitly sanitize non-finite values in W to match CPU behavior.
+        // Any NaN or infinite entries are replaced with epsilon to avoid propagation.
+        w = w.nan_to_num(1e-12f64)?;

		let eps_h = Tensor::full(1e-12f64, h.shape(), h.device())?;
		h = h.clamp(&eps_h, &Tensor::full(f64::MAX, h.shape(), h.device())?)?;

		let eps_w = Tensor::full(1e-12f64, w.shape(), w.device())?;
		w = w.clamp(&eps_w, &Tensor::full(f64::MAX, w.shape(), w.device())?)?;

Conversation

collord commented Feb 23, 2026

Summary

Root cause

Minimal reproducer

Fix

Test

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants