Notebook/Colab API + clean estimator __init__ params by cweniger · Pull Request #90 · cweniger/falcon

cweniger · 2026-06-08T21:11:25Z

Summary

New Python API (`falcon/api.py`): `falcon.Graph()`, `falcon.launch()`, `falcon.shutdown()` — build and run models programmatically without YAML, ship live Python objects (simulators, estimators, embeddings) to Ray actors via cloudpickle
Clean estimator `init`: `Flow` and `GaussianFullCov` now take explicit named params (`max_epochs`, `net_type`, `lr`, `gamma`, `lr_patience`, `use_best_models`, ...) instead of prefixed `loop_/network_/optimizer_/inference_` names — LSP/autocomplete friendly, raises `TypeError` on typos
Flat YAML: estimator config sections (`loop:`, `network:`, `optimizer:`, `inference:`) replaced with flat keys matching `init` param names exactly; `deployed_graph.py` passes YAML dict directly as `**kwargs`
Two-phase estimator lifecycle: `init` stores config only, `setup()` wires runtime objects — no `config=` arg on `setup()` anymore
Example notebooks: `01_minimal` and `04_gaussian` notebooks added/updated with programmatic API; all 7 example YAML configs updated to flat format
`run_loader.py`: `Run` object with `.config`, `.metrics`, `.samples`, `.buffer` for post-training analysis in notebooks
Rich reprs: `Graph` and `Run` render as Mermaid DAGs / summary tables in Jupyter
Cleanup: extracted `_build_optimizer()` helper in `GaussianFullCov` and `LossBasedEstimator`; removed planning artifacts; fixed stale TODO/FIXME comments

Breaking changes

Estimator `init` param names changed (e.g. `loop_max_epochs` → `max_epochs`) — existing code using old names will get `TypeError`
YAML configs with nested `loop:`/`network:`/`optimizer:`/`inference:` sections need to be flattened; `scheduler_patience` → `lr_patience`, `use_best_models_during_inference` → `use_best_models`
`setup()` no longer accepts `config=` argument

Test plan

Smoke tests pass: `01_minimal`, `02_bimodal`, `04_gaussian`
`Flow(**kwargs)` and `GaussianFullCov(**kwargs)` with new param names
`TypeError` raised for unknown kwargs
Programmatic API via notebooks: `falcon.Graph()` + `falcon.launch()`
`run_loader.py` `Run` object post-training

🤖 Generated with Claude Code

Captures the design for a Python-first front door to falcon: flat typed config surface bridged to nested YAML, the product/sum/composite/collection config-shape taxonomy, _target_ resolution unification (Step 0), the init/launch/shutdown Ray lifecycle, the cloudpickle escape hatch for notebook-defined models, JAX process-global-state handling, and the v1 interleaved color-tagged log stream for in-cell output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Step 0 (_target_ unification, net_type → variant classes, NetworkConfig untangle) deferred: no variant-specific hyperparams exposed yet, churn with no functional benefit; net_config: dict={} is the escape hatch if needed - prior list-syntax: existing list-of-lists form serves as Python API too, no typed-marginal objects needed for v1 - falcon.Simulator base class deferred: duck typing is sufficient for v1 - falcon.session() deferred: not needed before basic API works - falcon.init(): remove num_cpus/num_gpus, use **ray_init_kwargs passthrough - falcon.launch(): remove buffer_min_samples etc., model config belongs in Config/overrides; rename posterior_sample -> auto_sample - falcon.Sequential: dropped, use _input_ nesting instead - escape hatch: drop source-extraction to _live_objects.py, placeholder "<live object: ClassName>" is sufficient for v1 - example notebooks: .py (jupytext) as source of truth, .ipynb as build artefacts; existing run.py files untouched Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Splits launch_mode into: - _run_pipeline(cfg, *, auto_sample, timeout, stop_check, log_handler, on_graph_ready, summary_sink): pure training pipeline, no Ray lifecycle, no TUI concerns; injectable stop_check and log_handler; returns output_dir - launch_mode: thin CLI frontend; owns Ray init, TUI/shutdown-handler setup, stop_check closure, TUI log handler, status polling thread (via on_graph_ready) CLI behavior is byte-for-byte unchanged. _run_pipeline is now directly callable for the upcoming falcon.launch() Python API. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Tests plain callables, torch simulators, nn.Module subclasses, transitive __main__ deps, global closures (including ~8 MB), and class redefinition. All pass. CUDA tensors stored as instance attributes fail as expected; workaround (store numpy, convert inside forward) confirmed working. Conclusion: cloudpickle + Ray handles all normal notebook simulator patterns. Notebook-defined classes can be passed directly to add_node(); Ray's built-in cloudpickle serializes them transparently to actor processes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- falcon/core/flat_config.py: flat_to_nested, make_flat_signature, apply_flat_signature utilities for prefix-transform config builders - falcon/api.py: Config wrapper + falcon.config() entry point - falcon/estimators/flow.py: _FlowConfigBuilder with synthesised flat signature; Flow.__new__ returns builder when called without positional args, real estimator otherwise - falcon/estimators/gaussian_fullcov.py: rename GaussianPosterior → _GaussianPosterior (implementation detail, not public API) - falcon/estimators/gaussian.py: add deprecation TODO; update import - falcon/estimators/__init__.py: remove GaussianPosterior from exports - falcon/__init__.py: expose falcon.config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- falcon/api.py: add init(), shutdown(), _prepare_config(), launch() launch() resolves Config/path/dict target, lazily inits Ray, calls _run_pipeline, returns load_run(output_dir); wait=False raises NotImplementedError (deferred to Step 7) - falcon/__init__.py: expose init, launch, shutdown via lazy imports - examples/01_minimal/notebook.py: jupytext percent-format notebook covering config load, override, launch, and run inspection (cell story A) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- falcon/core/graph.py: Graph() starts empty (node_list=None default); extract _build(); add add_node() accepting live instances, observed=array, and ray_* kwargs; guard forward_deps.get() for partial graphs - falcon/core/deployed_graph.py: NodeWrapper skips LazyLoader for live simulator instances (isinstance str/type check) - falcon/cli.py: _run_pipeline gains graph=/observations= params; when provided, create_graph_from_config is bypassed - falcon/api.py: _prepare_config handles Graph target by synthesising a default config with _graph_to_config_dict escape-hatch serialization (<live object: ...> / <live array: ...> placeholders); launch() threads prebuilt_graph through to _run_pipeline - falcon/estimators/flow.py: guard OmegaConf.to_container on None embedding - falcon/embeddings/builder.py: instantiate_embedding(None) returns _PassthroughEmbedding (identity, casts to float32) - examples/04_gaussian/notebook.py: jupytext notebook for programmatic API Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Graph._repr_html_: Mermaid flowchart (CDN-loaded) with colour-coded nodes (blue=trainable, green=observed, yellow=deterministic), solid forward edges, dashed evidence edges - _short_cls_name(): module-level helper for compact class display names - Run._repr_html_: inline HTML status card showing per-node final loss and epoch count - Run.plot_metrics(): matplotlib figure of train/val loss curves, one subplot per node Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Generated from jupytext percent-format notebook.py sources. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

_launch was async def but contained zero await calls — all blocking was done via ray.get() / ray.wait(). Wrapping it in asyncio.run() broke Jupyter notebooks (which already have a running event loop). Fix: make _launch a plain def and call it directly; remove unused asyncio import. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The name "DatasetManager" caused a crash on the second falcon.launch() call in the same notebook session — Ray rejects duplicate actor names. The name was never used for lookup, so dropping it is safe. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

OmegaConf.resolve() was called in-place on the Config's DictConfig, baking the first run's run_dir into all interpolated paths. A second launch() call with a different output dir then inherited the resolved paths from the first run (e.g. paths.graph pointed at run6 even when output was run7). Fix: copy via OmegaConf.merge(cfg, {}) before mutating. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Same guard as flow.py: OmegaConf.to_container(None) raises ValueError, so skip it when no embedding config is provided. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When target is a Graph, always set prebuilt_graph regardless of whether a saved config.yml exists. The saved config contains <live object/array> placeholders that create_graph_from_config cannot parse as file paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

__init__ is now a pure dataclass (stores flat config kwargs only). setup() is load-bearing: receives runtime objects from NodeWrapper, merges stored config with YAML-sourced config, and wires everything up. BaseEstimator: - __init__(**flat_kwargs): stores self._init_flat_kwargs via flat_to_nested - __init_subclass__: injects per-class __init__ with flat signature when _CONFIG_SECTIONS is declared, giving autocomplete for free - setup() declared as abstract StepwiseEstimator: - __init__ removed (uses BaseEstimator's) - setup() initialises common loop state; subclasses set loop_config first Flow / GaussianFullCov: - _CONFIG_SECTIONS + _CONFIG_EXTRA_PARAMS replace _FlowConfigBuilder - __new__ trick and _FlowConfigBuilder removed entirely - setup() merges defaults < notebook kwargs < YAML/override config NodeWrapper: - Detects BaseEstimator instances (notebook path) vs class/string (YAML) - Always calls .setup() to wire up runtime components Result: Flow(loop_max_epochs=200) returns a real Flow. isinstance() works. New estimators need only declare _CONFIG_SECTIONS and implement setup(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Flow and GaussianFullCov __init__ params are now plain names (max_epochs, net_type, lr, gamma, lr_patience, use_best_models, ...) instead of loop_/network_/optimizer_/inference_ prefixes. deployed_graph.py passes flat YAML dict directly as **kwargs to estimator_cls.__init__; setup() no longer takes a config arg. All example YAML files updated to flat format. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

….yml Notebooks are the source of truth — .py mirror files removed. 04_gaussian notebook updated: GaussianFullCov now instantiated with explicit params matching config.yml (max_epochs, lr, gamma, etc.). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

codecov · 2026-06-08T21:13:53Z

Codecov Report

❌ Patch coverage is 1.32827% with 520 lines in your changes missing coverage. Please review.
✅ Project coverage is 9.20%. Comparing base (e22b690) to head (cec04a1).
⚠️ Report is 21 commits behind head on main.

Files with missing lines	Patch %	Lines
falcon/api.py	0.00%	131 Missing ⚠️
falcon/cli.py	0.00%	101 Missing ⚠️
falcon/core/run_loader.py	0.00%	65 Missing ⚠️
falcon/estimators/flow.py	0.00%	60 Missing ⚠️
falcon/core/graph.py	0.00%	53 Missing ⚠️
falcon/estimators/gaussian_fullcov.py	0.00%	40 Missing ⚠️
falcon/estimators/stepwise_base.py	0.00%	36 Missing ⚠️
falcon/core/deployed_graph.py	0.00%	16 Missing ⚠️
falcon/embeddings/builder.py	0.00%	9 Missing ⚠️
falcon/estimators/gaussian.py	0.00%	7 Missing ⚠️
... and 2 more

Additional details and impacted files

@@           Coverage Diff            @@
##            main     #90      +/-   ##
========================================
- Coverage   9.72%   9.20%   -0.53%     
========================================
  Files         33      34       +1     
  Lines       4154    4411     +257     
========================================
+ Hits         404     406       +2     
- Misses      3750    4005     +255

Flag	Coverage Δ
unit	`9.20% <1.32%> (-0.53%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Update epoch override paths from loop.max_epochs to max_epochs to match the flattened estimator YAML structure introduced in the previous commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Extract _build_optimizer() in GaussianFullCov and LossBasedEstimator to eliminate duplicated optimizer/scheduler setup between _initialize_model and load - Delete plans/COLAB_API_PLAN.md and plans/spikes/cloudpickle_spike.py - Drop "TODO: refactor" prefix from proposal bias correction comment - Remove stale TODO comment in deployed_graph._launch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cweniger and others added 18 commits May 17, 2026 19:12

Merge branch 'main' into plan/colab-notebook-api

1c807f1

Add notebook.ipynb for 01_minimal and 04_gaussian examples

2971058

Generated from jupytext percent-format notebook.py sources. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix None embedding crash in GaussianFullCov._create_model

f826b2b

Same guard as flow.py: OmegaConf.to_container(None) raises ValueError, so skip it when no embedding config is provided. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cweniger and others added 2 commits June 8, 2026 23:19

Fix smoke test OmegaConf overrides for flat estimator config

cca3c18

Update epoch override paths from loop.max_epochs to max_epochs to match the flattened estimator YAML structure introduced in the previous commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cweniger merged commit eda9021 into main Jun 8, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notebook/Colab API + clean estimator init params#90

Notebook/Colab API + clean estimator init params#90
cweniger merged 20 commits into
mainfrom
plan/colab-notebook-api

cweniger commented Jun 8, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cweniger commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Breaking changes

Test plan

Uh oh!

codecov Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cweniger commented Jun 8, 2026 •

edited

Loading

codecov Bot commented Jun 8, 2026 •

edited

Loading