Add DynamicSVD and ToeplitzWhitener by cweniger · Pull Request #89 · cweniger/falcon

cweniger · 2026-06-08T10:57:38Z

Summary

DynamicSVD: streaming SVD with eigenvalue-scaled momentum updates, Procrustes-stabilized output coefficients, and Wiener filter shrinkage. Accepts an optional injected whitener; when provided, update(x, signal) computes noise = x - signal to update the whitener before the SVD step.
ToeplitzWhitener: stationary noise whitener for 1D time series. Estimates per-frequency variance via EMA in Hartley space; lazy-initialized, zero-mean assumption.
Example 05: adds --embedding svd mode wiring DynamicSVD(whitener=DiagonalWhitener(...)) as a drop-in embedding that handles whitening and dimensionality reduction jointly.

DynamicSVD supersedes PCAProjector: it uses a mathematically correct eigenvalue-scaled covariance blend (rather than a linear component average) and Procrustes alignment for output stability — making it suitable as a neural network input stage. PCAProjector is left in place for backwards compatibility.

Design notes

Whitener is injected, not baked in — swap DiagonalWhitener for ToeplitzWhitener (or nothing) depending on the noise structure.
forward(x, signal=None) accepts signal for interface symmetry with update() but does not use it at inference (whitener stats are frozen).
reconstruct(x) returns Wiener-filtered data in whitened D-dimensional space.

Test plan

DynamicSVD without whitener: output shape (batch, k), zeros before first update
DynamicSVD with DiagonalWhitener: output std ~1 after warmup
DynamicSVD with ToeplitzWhitener on 1D time series
Example 05: python standalone.py --embedding svd --num_steps 1000 --n_bins 512 --device cpu

🤖 Generated with Claude Code

Captures the design for a Python-first front door to falcon: flat typed config surface bridged to nested YAML, the product/sum/composite/collection config-shape taxonomy, _target_ resolution unification (Step 0), the init/launch/shutdown Ray lifecycle, the cloudpickle escape hatch for notebook-defined models, JAX process-global-state handling, and the v1 interleaved color-tagged log stream for in-cell output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Step 0 (_target_ unification, net_type → variant classes, NetworkConfig untangle) deferred: no variant-specific hyperparams exposed yet, churn with no functional benefit; net_config: dict={} is the escape hatch if needed - prior list-syntax: existing list-of-lists form serves as Python API too, no typed-marginal objects needed for v1 - falcon.Simulator base class deferred: duck typing is sufficient for v1 - falcon.session() deferred: not needed before basic API works - falcon.init(): remove num_cpus/num_gpus, use **ray_init_kwargs passthrough - falcon.launch(): remove buffer_min_samples etc., model config belongs in Config/overrides; rename posterior_sample -> auto_sample - falcon.Sequential: dropped, use _input_ nesting instead - escape hatch: drop source-extraction to _live_objects.py, placeholder "<live object: ClassName>" is sufficient for v1 - example notebooks: .py (jupytext) as source of truth, .ipynb as build artefacts; existing run.py files untouched Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Splits launch_mode into: - _run_pipeline(cfg, *, auto_sample, timeout, stop_check, log_handler, on_graph_ready, summary_sink): pure training pipeline, no Ray lifecycle, no TUI concerns; injectable stop_check and log_handler; returns output_dir - launch_mode: thin CLI frontend; owns Ray init, TUI/shutdown-handler setup, stop_check closure, TUI log handler, status polling thread (via on_graph_ready) CLI behavior is byte-for-byte unchanged. _run_pipeline is now directly callable for the upcoming falcon.launch() Python API. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Tests plain callables, torch simulators, nn.Module subclasses, transitive __main__ deps, global closures (including ~8 MB), and class redefinition. All pass. CUDA tensors stored as instance attributes fail as expected; workaround (store numpy, convert inside forward) confirmed working. Conclusion: cloudpickle + Ray handles all normal notebook simulator patterns. Notebook-defined classes can be passed directly to add_node(); Ray's built-in cloudpickle serializes them transparently to actor processes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- falcon/core/flat_config.py: flat_to_nested, make_flat_signature, apply_flat_signature utilities for prefix-transform config builders - falcon/api.py: Config wrapper + falcon.config() entry point - falcon/estimators/flow.py: _FlowConfigBuilder with synthesised flat signature; Flow.__new__ returns builder when called without positional args, real estimator otherwise - falcon/estimators/gaussian_fullcov.py: rename GaussianPosterior → _GaussianPosterior (implementation detail, not public API) - falcon/estimators/gaussian.py: add deprecation TODO; update import - falcon/estimators/__init__.py: remove GaussianPosterior from exports - falcon/__init__.py: expose falcon.config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

DynamicSVD: streaming SVD with eigenvalue-scaled momentum updates, Procrustes-stabilized output, Wiener filter shrinkage, and optional injected whitener (update(x, signal) estimates noise = x - signal). ToeplitzWhitener: stationary noise whitener via EMA variance estimation in Hartley space; lazy-initialized, zero-mean assumption. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- falcon/api.py: add init(), shutdown(), _prepare_config(), launch() launch() resolves Config/path/dict target, lazily inits Ray, calls _run_pipeline, returns load_run(output_dir); wait=False raises NotImplementedError (deferred to Step 7) - falcon/__init__.py: expose init, launch, shutdown via lazy imports - examples/01_minimal/notebook.py: jupytext percent-format notebook covering config load, override, launch, and run inspection (cell story A) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- falcon/core/graph.py: Graph() starts empty (node_list=None default); extract _build(); add add_node() accepting live instances, observed=array, and ray_* kwargs; guard forward_deps.get() for partial graphs - falcon/core/deployed_graph.py: NodeWrapper skips LazyLoader for live simulator instances (isinstance str/type check) - falcon/cli.py: _run_pipeline gains graph=/observations= params; when provided, create_graph_from_config is bypassed - falcon/api.py: _prepare_config handles Graph target by synthesising a default config with _graph_to_config_dict escape-hatch serialization (<live object: ...> / <live array: ...> placeholders); launch() threads prebuilt_graph through to _run_pipeline - falcon/estimators/flow.py: guard OmegaConf.to_container on None embedding - falcon/embeddings/builder.py: instantiate_embedding(None) returns _PassthroughEmbedding (identity, casts to float32) - examples/04_gaussian/notebook.py: jupytext notebook for programmatic API Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds --embedding svd option to standalone.py, wiring DynamicSVD with DiagonalWhitener as a drop-in embedding that handles whitening and dimensionality reduction jointly via streaming SVD updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

codecov · 2026-06-08T10:59:49Z

Codecov Report

❌ Patch coverage is 1.12721% with 614 lines in your changes missing coverage. Please review.
✅ Project coverage is 9.00%. Comparing base (e22b690) to head (d274630).
⚠️ Report is 21 commits behind head on main.

Files with missing lines	Patch %	Lines
falcon/api.py	0.00%	131 Missing ⚠️
falcon/cli.py	0.00%	101 Missing ⚠️
falcon/embeddings/svd.py	0.00%	72 Missing ⚠️
falcon/core/run_loader.py	0.00%	65 Missing ⚠️
falcon/estimators/flow.py	0.00%	60 Missing ⚠️
falcon/core/graph.py	0.00%	53 Missing ⚠️
falcon/estimators/gaussian_fullcov.py	0.00%	50 Missing ⚠️
falcon/estimators/stepwise_base.py	0.00%	28 Missing ⚠️
falcon/embeddings/norms.py	0.00%	18 Missing ⚠️
falcon/core/deployed_graph.py	0.00%	16 Missing ⚠️
... and 5 more

Additional details and impacted files

@@           Coverage Diff            @@
##            main     #89      +/-   ##
========================================
- Coverage   9.72%   9.00%   -0.73%     
========================================
  Files         33      34       +1     
  Lines       4154    4509     +355     
========================================
+ Hits         404     406       +2     
- Misses      3750    4103     +353

Flag	Coverage Δ
unit	`9.00% <1.12%> (-0.73%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Graph._repr_html_: Mermaid flowchart (CDN-loaded) with colour-coded nodes (blue=trainable, green=observed, yellow=deterministic), solid forward edges, dashed evidence edges - _short_cls_name(): module-level helper for compact class display names - Run._repr_html_: inline HTML status card showing per-node final loss and epoch count - Run.plot_metrics(): matplotlib figure of train/val loss curves, one subplot per node Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Generated from jupytext percent-format notebook.py sources. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

_launch was async def but contained zero await calls — all blocking was done via ray.get() / ray.wait(). Wrapping it in asyncio.run() broke Jupyter notebooks (which already have a running event loop). Fix: make _launch a plain def and call it directly; remove unused asyncio import. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The name "DatasetManager" caused a crash on the second falcon.launch() call in the same notebook session — Ray rejects duplicate actor names. The name was never used for lookup, so dropping it is safe. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

OmegaConf.resolve() was called in-place on the Config's DictConfig, baking the first run's run_dir into all interpolated paths. A second launch() call with a different output dir then inherited the resolved paths from the first run (e.g. paths.graph pointed at run6 even when output was run7). Fix: copy via OmegaConf.merge(cfg, {}) before mutating. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Same guard as flow.py: OmegaConf.to_container(None) raises ValueError, so skip it when no embedding config is provided. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When target is a Graph, always set prebuilt_graph regardless of whether a saved config.yml exists. The saved config contains <live object/array> placeholders that create_graph_from_config cannot parse as file paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

__init__ is now a pure dataclass (stores flat config kwargs only). setup() is load-bearing: receives runtime objects from NodeWrapper, merges stored config with YAML-sourced config, and wires everything up. BaseEstimator: - __init__(**flat_kwargs): stores self._init_flat_kwargs via flat_to_nested - __init_subclass__: injects per-class __init__ with flat signature when _CONFIG_SECTIONS is declared, giving autocomplete for free - setup() declared as abstract StepwiseEstimator: - __init__ removed (uses BaseEstimator's) - setup() initialises common loop state; subclasses set loop_config first Flow / GaussianFullCov: - _CONFIG_SECTIONS + _CONFIG_EXTRA_PARAMS replace _FlowConfigBuilder - __new__ trick and _FlowConfigBuilder removed entirely - setup() merges defaults < notebook kwargs < YAML/override config NodeWrapper: - Detects BaseEstimator instances (notebook path) vs class/string (YAML) - Always calls .setup() to wire up runtime components Result: Flow(loop_max_epochs=200) returns a real Flow. isinstance() works. New estimators need only declare _CONFIG_SECTIONS and implement setup(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Flow and GaussianFullCov __init__ params are now plain names (max_epochs, net_type, lr, gamma, lr_patience, use_best_models, ...) instead of loop_/network_/optimizer_/inference_ prefixes. deployed_graph.py passes flat YAML dict directly as **kwargs to estimator_cls.__init__; setup() no longer takes a config arg. All example YAML files updated to flat format. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

….yml Notebooks are the source of truth — .py mirror files removed. 04_gaussian notebook updated: GaussianFullCov now instantiated with explicit params matching config.yml (max_epochs, lr, gamma, etc.). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…roposals) - Split LinearSimulator into SignalSimulator + NoiseSimulator (scaffold mechanism) - DynamicSVD auto-updates in forward() during training (no wrapper needed) - config: scaffolds: [mu], n_components: 32, prior_epochs: 20 - run.py: single entry point with corner/buffer/loss plots - gen_mock_data.py: noisy observation, analytic posterior saved - Moved standalone.py to extras/ - Known issue: proposal samples wider than prior after few rounds -- root cause TBD Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ponents _best_model was a deepcopy of _model taken before DynamicSVD ever ran its first update (the initial eval-mode forward pass in _create_model skips update()). DynamicSVD.components/eigenvalues/_R are plain Python attrs, not registered buffers, so load_state_dict never refreshes them. Every proposal sampling call therefore got torch.randn from the cold-start path instead of the actual SVD projection — widening proposals far beyond prior width. Fix: share the embedding object between _model and _best_model so the live (always up-to-date) DynamicSVD basis is used for both training and sampling. Also adds plot_buffer_mean to 05_linear_regression/run.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- max_epochs 100 → 1000 (model wasn't converged at 100) - prior_epochs 40 → 20 (warm-up can be shorter with the embedding fix) - snapshot_every 1 → 10 (reduce I/O overhead) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The previous fix shared the embedding object between _model and _best_model, which broke the invariant that _best_model represents the full state at the best validation epoch. Replaced with _sync_embedding_to_best(): when a new best checkpoint is saved, walk the matching module pairs and clone components/eigenvalues/_R from _model into _best_model alongside the load_state_dict weight copy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove defunct default-store-dir, default-graph-dir, outputs, and zarr entries (none referenced in the codebase). Add **/output/ to ignore example run directories. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cweniger and others added 10 commits May 17, 2026 19:12

Merge branch 'main' into plan/colab-notebook-api

1c807f1

cweniger and others added 16 commits June 8, 2026 13:04

Add notebook.ipynb for 01_minimal and 04_gaussian examples

2971058

Generated from jupytext percent-format notebook.py sources. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix None embedding crash in GaussianFullCov._create_model

f826b2b

Same guard as flow.py: OmegaConf.to_container(None) raises ValueError, so skip it when no embedding config is provided. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'plan/colab-notebook-api' into feature/dynamic-svd

c47cbc8

Clean up .gitignore: replace stale entries with **/output/

d274630

Remove defunct default-store-dir, default-graph-dir, outputs, and zarr entries (none referenced in the codebase). Add **/output/ to ignore example run directories. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DynamicSVD and ToeplitzWhitener#89

Add DynamicSVD and ToeplitzWhitener#89
cweniger wants to merge 26 commits into
mainfrom
feature/dynamic-svd

cweniger commented Jun 8, 2026

Uh oh!

codecov Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cweniger commented Jun 8, 2026

Summary

Design notes

Test plan

Uh oh!

codecov Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented Jun 8, 2026 •

edited

Loading