Flag to/not to forward model the posterior samples

# Written using Claude Code
# Falcon 0.4.1 — Posterior-predictive forward-modelling flag

A self-contained enhancement, separate from the checkpoint/resume
cluster in `falcon_checkpoint_resume_issues.md`.

---

## Issue — `--forward-model` flag to push posterior samples through the simulator graph

### Summary

After training, `falcon sample posterior` produces samples of the
*latent* parameters only (e.g. `z`, `pw`). To validate the fit
(posterior-predictive check) or generate simulated observations
consistent with the posterior, the user currently has to write a
bespoke script that loads the samples, imports `model.py`, and
threads the latent samples through every downstream simulator node
by hand.

Add a flag — `falcon sample posterior --forward-model` (or
equivalent on `falcon launch`) — that, after drawing posterior
samples, runs the deterministic forward simulators on those samples
and saves the resulting observable-level fields alongside the
latents. The user already declared the simulator chain in
`config.yml`; falcon already has the graph machinery to execute it
in forward order. This issue is about wiring those two pieces
together behind one flag.

### Current behaviour (0.4.1)

- `DeployedGraph.sample_posterior`
  (`falcon/core/deployed_graph.py:683-695`) runs
  `_execute_graph(num_samples, self.graph.backward_order,
  condition_refs, "sample_posterior")`. `backward_order` only
  visits nodes with an estimator — the deterministic forward
  simulators (`gains`, `stacked_uv`, `df`, `bt`, `image`, …) are
  **not** executed.
- The saved NPZ in `{paths.samples}/posterior/{timestamp}/000000.npz`
  contains only the latent keys (`z.value`, `pw.value`,
  `*.log_prob`). It does not contain `image.value`, `obsx.value`,
  `pk_obs.value`, etc.
- `_execute_graph` already supports the forward direction — it's
  called with `self.graph.forward_order` during simulation
  (deployed_graph.py:828). What's missing is a path that:
  (a) draws posterior samples for latent nodes, then
  (b) forward-executes the remaining simulators using those
      samples as conditions.

### Proposed behaviour

1. **New CLI flag** on `falcon sample posterior`:
   ```bash
   falcon sample posterior -o outputs/run --forward-model
   falcon sample posterior -o outputs/run --forward-model --include-nodes image,obsx,pk_obs
   falcon sample posterior -o outputs/run --forward-model --exclude-nodes noiseimage
   ```
   Equivalent YAML knob so the end-of-launch posterior-sampling
   step in `falcon launch` honours it too:
   ```yaml
   sample:
     posterior:
       n: 1500
       forward_model: true                          # default false
       forward_model_nodes: [image, obsx, pk_obs]   # default: all forward nodes
   ```

2. **Execution path** in `DeployedGraph`:
   - Call existing `sample_posterior(n_samples, observations)` to
     get `posterior_refs` (latents only).
   - Extract latent values via `_extract_value_refs(posterior_refs)`
     — same pattern as the proposal-sampling path in
     `deployed_graph.py:819-820`.
   - Call `_execute_graph(n_samples, self.graph.forward_order,
     latent_refs, "sample")` to forward-simulate. Override
     observed nodes: they should be *re-simulated* from the
     posterior latents, not pinned to the on-disk observation
     (that's the whole point of a posterior-predictive check).
   - Merge latent and forward-modelled refs into one batch.

3. **Output layout**:
   ```
   {paths.samples}/posterior/{timestamp}/000000.npz   # latents only (today's behaviour)
   {paths.samples}/posterior_predictive/{timestamp}/000000.npz   # NEW: latents + forward
   ```
   Use a separate subdirectory so an existing posterior-only run
   isn't silently overwritten. The NPZ contains every node's
   `value` (and `log_prob` where applicable), keyed by node name —
   matching the existing sample-file schema so
   `falcon.read_samples()` works without changes.

4. **Filtering** via `--include-nodes` / `--exclude-nodes`:
   - For large simulator chains (this repo's graph has ~15
     downstream nodes), the full NPZ can be GB-scale. Default
     to all forward nodes; let users prune.
   - Always include the latent nodes (`z`, `pw`) regardless of
     filter, so the NPZ is self-describing.

5. **Reproducibility**: forward simulators may use RNGs (this
   repo's `Noise` class does). Accept an optional `--seed` so
   posterior-predictive runs are reproducible.

6. **Console output**:
   ```
   ✓ Drew 1500 posterior samples (latents: z, pw)
   ↻ Forward-modelling through 12 simulator nodes...
   ✓ Saved posterior + forward-modelled samples to outputs/run/samples_dir/posterior_predictive/2026-05-26T15-30/
   ```

### Acceptance criteria

- `falcon sample posterior -o <run> --forward-model` produces an
  NPZ in `posterior_predictive/` containing keys for every node in
  the graph (or the filtered subset).
- Loading via `falcon.read_samples(<run>, kind="posterior_predictive")`
  returns a dict-like object indexed by node name.
- Observed nodes (`V_ref`, `pk_obs` in this repo) appear in the
  output with values *re-simulated* from posterior latents — not
  pinned to the observation NPZ. A test asserts that
  `samples["pk_obs"]` varies across draws.
- `--include-nodes` / `--exclude-nodes` and the YAML equivalent
  filter the saved set as specified.
- `--seed` makes two consecutive runs bit-identical (modulo
  ordering across Ray actors — assert `mean` and `std` match
  exactly for deterministic nodes).
- A unit test on `01_minimal` with `--forward-model` asserts the
  NPZ contains both latent and observation-level keys.

### Out of scope

- Posterior-predictive *p-values* / diagnostic plotting — that's
  a downstream notebook concern.
- Conditional posterior-predictive (e.g. fixing one latent and
  forward-modelling the others) — single flag, single mode.
- Streaming / chunked output for very large graphs — initial
  implementation writes one NPZ per run; can be revisited if it
  becomes a memory issue.

### Why this is worth it

The current workaround is to copy the relevant `model.py` classes
into a notebook and re-instantiate them with the right config
constants. That's brittle: every change to `model.py` has to be
mirrored in the analysis notebook, and any device/dtype subtlety
(e.g. the `complex64` casting in `createVreffromgains`) is easy
to get wrong on the analysis side. Putting forward modelling
behind a flag means the same simulator code that trained the
estimator also validates it — no parallel implementation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flag to/not to forward model the posterior samples #76

Written using Claude Code

Falcon 0.4.1 — Posterior-predictive forward-modelling flag

Issue — `--forward-model` flag to push posterior samples through the simulator graph

Summary

Current behaviour (0.4.1)

Proposed behaviour

Acceptance criteria

Out of scope

Why this is worth it

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Flag to/not to forward model the posterior samples #76

Description

Written using Claude Code

Falcon 0.4.1 — Posterior-predictive forward-modelling flag

Issue — --forward-model flag to push posterior samples through the simulator graph

Summary

Current behaviour (0.4.1)

Proposed behaviour

Acceptance criteria

Out of scope

Why this is worth it

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Issue — `--forward-model` flag to push posterior samples through the simulator graph