Skip to content

Flag to/not to forward model the posterior samples #76

@JoySanghavi

Description

@JoySanghavi

Written using Claude Code

Falcon 0.4.1 — Posterior-predictive forward-modelling flag

A self-contained enhancement, separate from the checkpoint/resume
cluster in falcon_checkpoint_resume_issues.md.


Issue — --forward-model flag to push posterior samples through the simulator graph

Summary

After training, falcon sample posterior produces samples of the
latent parameters only (e.g. z, pw). To validate the fit
(posterior-predictive check) or generate simulated observations
consistent with the posterior, the user currently has to write a
bespoke script that loads the samples, imports model.py, and
threads the latent samples through every downstream simulator node
by hand.

Add a flag — falcon sample posterior --forward-model (or
equivalent on falcon launch) — that, after drawing posterior
samples, runs the deterministic forward simulators on those samples
and saves the resulting observable-level fields alongside the
latents. The user already declared the simulator chain in
config.yml; falcon already has the graph machinery to execute it
in forward order. This issue is about wiring those two pieces
together behind one flag.

Current behaviour (0.4.1)

  • DeployedGraph.sample_posterior
    (falcon/core/deployed_graph.py:683-695) runs
    _execute_graph(num_samples, self.graph.backward_order, condition_refs, "sample_posterior"). backward_order only
    visits nodes with an estimator — the deterministic forward
    simulators (gains, stacked_uv, df, bt, image, …) are
    not executed.
  • The saved NPZ in {paths.samples}/posterior/{timestamp}/000000.npz
    contains only the latent keys (z.value, pw.value,
    *.log_prob). It does not contain image.value, obsx.value,
    pk_obs.value, etc.
  • _execute_graph already supports the forward direction — it's
    called with self.graph.forward_order during simulation
    (deployed_graph.py:828). What's missing is a path that:
    (a) draws posterior samples for latent nodes, then
    (b) forward-executes the remaining simulators using those
    samples as conditions.

Proposed behaviour

  1. New CLI flag on falcon sample posterior:

    falcon sample posterior -o outputs/run --forward-model
    falcon sample posterior -o outputs/run --forward-model --include-nodes image,obsx,pk_obs
    falcon sample posterior -o outputs/run --forward-model --exclude-nodes noiseimage

    Equivalent YAML knob so the end-of-launch posterior-sampling
    step in falcon launch honours it too:

    sample:
      posterior:
        n: 1500
        forward_model: true                          # default false
        forward_model_nodes: [image, obsx, pk_obs]   # default: all forward nodes
  2. Execution path in DeployedGraph:

    • Call existing sample_posterior(n_samples, observations) to
      get posterior_refs (latents only).
    • Extract latent values via _extract_value_refs(posterior_refs)
      — same pattern as the proposal-sampling path in
      deployed_graph.py:819-820.
    • Call _execute_graph(n_samples, self.graph.forward_order, latent_refs, "sample") to forward-simulate. Override
      observed nodes: they should be re-simulated from the
      posterior latents, not pinned to the on-disk observation
      (that's the whole point of a posterior-predictive check).
    • Merge latent and forward-modelled refs into one batch.
  3. Output layout:

    {paths.samples}/posterior/{timestamp}/000000.npz   # latents only (today's behaviour)
    {paths.samples}/posterior_predictive/{timestamp}/000000.npz   # NEW: latents + forward
    

    Use a separate subdirectory so an existing posterior-only run
    isn't silently overwritten. The NPZ contains every node's
    value (and log_prob where applicable), keyed by node name —
    matching the existing sample-file schema so
    falcon.read_samples() works without changes.

  4. Filtering via --include-nodes / --exclude-nodes:

    • For large simulator chains (this repo's graph has ~15
      downstream nodes), the full NPZ can be GB-scale. Default
      to all forward nodes; let users prune.
    • Always include the latent nodes (z, pw) regardless of
      filter, so the NPZ is self-describing.
  5. Reproducibility: forward simulators may use RNGs (this
    repo's Noise class does). Accept an optional --seed so
    posterior-predictive runs are reproducible.

  6. Console output:

    ✓ Drew 1500 posterior samples (latents: z, pw)
    ↻ Forward-modelling through 12 simulator nodes...
    ✓ Saved posterior + forward-modelled samples to outputs/run/samples_dir/posterior_predictive/2026-05-26T15-30/
    

Acceptance criteria

  • falcon sample posterior -o <run> --forward-model produces an
    NPZ in posterior_predictive/ containing keys for every node in
    the graph (or the filtered subset).
  • Loading via falcon.read_samples(<run>, kind="posterior_predictive")
    returns a dict-like object indexed by node name.
  • Observed nodes (V_ref, pk_obs in this repo) appear in the
    output with values re-simulated from posterior latents — not
    pinned to the observation NPZ. A test asserts that
    samples["pk_obs"] varies across draws.
  • --include-nodes / --exclude-nodes and the YAML equivalent
    filter the saved set as specified.
  • --seed makes two consecutive runs bit-identical (modulo
    ordering across Ray actors — assert mean and std match
    exactly for deterministic nodes).
  • A unit test on 01_minimal with --forward-model asserts the
    NPZ contains both latent and observation-level keys.

Out of scope

  • Posterior-predictive p-values / diagnostic plotting — that's
    a downstream notebook concern.
  • Conditional posterior-predictive (e.g. fixing one latent and
    forward-modelling the others) — single flag, single mode.
  • Streaming / chunked output for very large graphs — initial
    implementation writes one NPZ per run; can be revisited if it
    becomes a memory issue.

Why this is worth it

The current workaround is to copy the relevant model.py classes
into a notebook and re-instantiate them with the right config
constants. That's brittle: every change to model.py has to be
mirrored in the analysis notebook, and any device/dtype subtlety
(e.g. the complex64 casting in createVreffromgains) is easy
to get wrong on the analysis side. Putting forward modelling
behind a flag means the same simulator code that trained the
estimator also validates it — no parallel implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions