Skip to content

Add backend extension-point hooks for out-of-tree FDTD backends#3211

Draft
Luochenghuang wants to merge 13 commits intoNanoComp:masterfrom
Luochenghuang:backend-hooks
Draft

Add backend extension-point hooks for out-of-tree FDTD backends#3211
Luochenghuang wants to merge 13 commits intoNanoComp:masterfrom
Luochenghuang:backend-hooks

Conversation

@Luochenghuang
Copy link
Copy Markdown
Contributor

Summary

Adds a small process-global function-pointer table (POD layout, no virtual dispatch) that lets an external library, such as a CUDA, ROCm, or vectorized-CPU backend or an instrumentation shim, plug into meep's hot paths at load time without forking the codebase.

All hooks default to null. Null means "fall through to the in-tree CPU implementation." A stock build with no backend loaded behaves bit-identically to one without these hooks. This is asserted by a new test (tests/backend_hooks.cpp) that runs under make check.

What this PR adds

  • src/meep/backend_hooks.hpp: a flat function-pointer struct (meep::backend_hooks) plus two inline helpers (sync_host_if_needed, read_field_at). Hooks: init, cleanup, step, sync_to_host, sync_from_host, read_point, needs_host_sync.
  • src/backend_hooks.cpp: the process-global table, zero-initialized.
  • fields::backend_state and fields_chunk::backend_state: opaque void * slots for backend-owned per-sim and per-chunk state. Upstream meep never inspects them.
  • fields::backend_suspended: when true, the step hook is bypassed and the in-tree CPU step path runs instead. Used by solve_cw, which iterates against host arrays directly.
  • Hook call sites at the points where backends need to interpose: step dispatch (step.cpp), sim lifecycle (fields.cpp), generic readout dispatcher (loop_in_chunks.cpp), get_field (monitor.cpp), dft_norm (dft.cpp), dump/load (fields_dump.cpp), solve_cw (cw_fields.cpp), and the LDOS hot path (dft_ldos.cpp).
  • tests/backend_hooks.cpp: a counting "transparent" backend that implements every hook but defers all work to the CPU. Asserts (a) bit-identical numerical results between baseline and hooked runs, and (b) hooks fire at the expected sites with the expected counts. Runs in make check.
  • doc/docs/Backend_Hooks.md: contract reference covering hook signatures, lifecycle ordering, opaque-state semantics, suspension, MPI contracts (collective step() return, etc.), and a tiny skeleton-backend example.

Total invasive change to existing source files: ~80 lines, all 1 to 7 line insertions. The bulk of the PR is the new header, the test, and the doc.

What this PR does not change

  • Numerical results for any existing test or simulation. Every hook call is gated on a null-pointer check and runs no code when no backend is installed. The new test asserts this.
  • The public Python or C++ API.
  • The build system's defaults.
  • Performance for vanilla meep. Each call site adds at most a single null-pointer test, which the branch predictor handles perfectly. The hot paths inside dft_ldos.cpp and monitor.cpp::get_field use the inline read_field_at helper that compiles down to a direct array load when no backend is installed (with -O2+ the conditional resolves at compile time after inlining and loop-unswitching).

Why

Today, anyone who wants to add a non-CPU backend to meep has to fork the repository and patch the same ~30 lines in step.cpp, cw_fields.cpp, the readout sites, and the fields class declaration. That fork then needs to be rebased against upstream forever. This PR turns that recurring patch into a stable extension point so backends can ship as out-of-tree libraries pinned to specific meep versions.

The motivating use case is a multi-GPU CUDA/NCCL backend that today lives as a fork; with these hooks, it can move to a sibling repo with no patches against upstream meep.

Hook surface

namespace meep {
struct backend_hooks {
  void    (*init)           (fields *f);          // per-sim setup
  void    (*cleanup)        (fields *f);          // per-sim teardown
  bool    (*step)           (fields *f);          // one FDTD timestep; return true to skip CPU path
  void    (*sync_to_host)   (fields *f);          // device -> host arrays
  void    (*sync_from_host) (fields *f);          // host arrays -> device
  realnum (*read_point)     (const fields *f, const fields_chunk *fc,
                             component c, int cmp, ptrdiff_t idx);
  bool    (*needs_host_sync)(const fields *f);    // gate for sync_to_host
};
extern backend_hooks meep_backend;
}

Defaults all null. Backends populate at load time (typically via __attribute__((constructor))).

MPI

The hook table is process-global; each MPI rank has its own. Backends are responsible for any cross-rank coordination (NCCL, MPI, sum_to_all, etc.). Meep never synchronizes hook invocations across ranks.

The doc page spells out the implicit contracts in detail. The most important one: step()'s return value must be collective. If one rank returns true (skip CPU path) and another returns false, the latter will execute step_boundaries, which calls MPI_Sendrecv, while the former will not, and the run will deadlock. A backend that handles some configurations but not others must agree across ranks before returning, or simply always return true once installed.

ABI

No formal ABI versioning. Backends are expected to be built against a specific meep version, like Postgres extensions or out-of-tree kernel modules. Adding a new function pointer at the end of the struct is forward-compatible because the global is zero-initialized; reordering or removing fields breaks ABI for already-built backends.

Open question

Naming. meep_backend_* (current) vs meep::backend::* vs something else. Happy to rename.

Luochenghuang and others added 13 commits April 29, 2026 14:02
Introduces a small C-ABI table of function pointers that an external
backend library (e.g. CUDA, ROCm, vectorized-CPU) can install at load
time to redirect hot paths through its own implementation.  All hooks
default to null; null means "fall through to the in-tree CPU code", so
a stock build with no backend loaded behaves bit-identically to one
without these hooks.

This commit lands the foundation:

  * src/meep/backend_hooks.hpp  — flat C-ABI hook table + helpers
  * src/backend_hooks.cpp       — the process-global table (zeroed)
  * src/meep.hpp                — `void *backend_state` slot on
                                  `fields` and `fields_chunk`, opaque
                                  storage owned by the backend
  * src/fields.cpp              — call `meep_backend.init` at end of
                                  `fields` ctor (primary + copy) and
                                  `meep_backend.cleanup` at start of
                                  `~fields`, while chunks are still alive
  * src/step.cpp                — step-dispatch hook: if a backend
                                  returns true from `step(this)`, the
                                  CPU step path is skipped

Follow-up commit will add `sync_host_if_needed` calls at the CPU
readout sites (DFT, monitors, integration, dump/load, CW solver) so
backends never have to patch upstream files.
Now that backends can install hooks (foundation commit), call them at
each upstream code path that reads or writes the canonical host
arrays.  No-op when no backend is loaded.

  * src/loop_in_chunks.cpp  — sync_host_if_needed at the entry of
                              fields::loop_in_chunks (covers integrate,
                              energy/flux, DFT add/output chunkloops,
                              and every other loop-in-chunks consumer)
  * src/monitor.cpp         — sync_host_if_needed at the chunk-iterating
                              leaf of fields::get_field; the other
                              get_field overloads funnel through here
  * src/dft.cpp             — sync_host_if_needed at fields::dft_norm
                              (reads DFT accumulators directly)
  * src/fields_dump.cpp     — sync_host_if_needed before fields::dump
                              writes HDF5; sync_from_host after
                              fields::load reads HDF5 back in
  * src/cw_fields.cpp       — sync_host_if_needed at the start of
                              fields::solve_cw; sync_from_host at the
                              end so normal stepping resumes against
                              the converged solution
  * src/step.cpp            — guard the step-dispatch hook with
                              !doing_solve_cw so the backend stays
                              suspended during CW iterations

After this commit, an external backend that implements
`init`, `cleanup`, `step`, `sync_to_host`, `sync_from_host`,
`is_active`, and `host_is_stale` should be able to run a full meep
simulation without patching any upstream files.
The previous version's `if (!doing_solve_cw && meep_backend.step && ...)`
assumed `doing_solve_cw` was visible from `fields::step()`.  On clean
upstream meep that field is private, so the build failed with:

  step.cpp: error: 'doing_solve_cw' was not declared in this scope

Move the suppression to the caller: `solve_cw` saves and nulls
`meep_backend.step` for the duration of its CG loop, then restores it
before the final sync_from_host.  Same effect, no coupling to upstream
private state.
…redicate)

Changes the hook table shape based on review notes:

  * Add `read_point` hook for fast single-cell field reads.  Avoids
    per-step full-grid downloads on the LDOS / point-monitor hot path.
    Fall back to sync_to_host + direct array read when the hook is null.

  * Add `bool fields::backend_suspended` flag.  When true, the step hook
    is bypassed and the in-tree CPU step path runs instead.  Replaces
    the global save/null/restore of `meep_backend.step` that solve_cw
    previously did, which was fragile under any concurrent access to
    the backend table.

  * Collapse `is_active` and `host_is_stale` into a single
    `needs_host_sync` predicate.  A backend that's not active returns
    false; a backend whose host arrays are already in sync returns
    false; otherwise true.  Simpler call-site logic, smaller struct.

  * Rework header dependency direction.  `meep/backend_hooks.hpp` now
    includes `meep.hpp` (rather than the other way round), which lets
    the hook signature use real meep types (`realnum`, `component`,
    `fields`, `fields_chunk`) without typedef gymnastics.  The 8 source
    files that use a hook now `#include "meep/backend_hooks.hpp"`
    directly.

  * `dft_ldos.cpp` is wired through the new `read_point` hook with a
    fallback to direct array reads when no backend implements it.

No behavior change for vanilla meep: every hook still defaults to null,
every call site is still a null-pointer test.
Adds tests/backend_hooks.cpp -- a counting "transparent" backend that
implements every hook but defers all real work to the CPU path.  The
test asserts:

  1. Numerical results are bit-identical between a baseline run (no
     backend) and one with the transparent backend installed.  This is
     the load-bearing claim for backend authors: the hook surface
     itself doesn't perturb values.

  2. Hooks fire at the expected sites: `init` exactly once on
     construction, `cleanup` exactly once on destruction, `step` once
     per fields::step() call, sync_to_host never (since the
     needs_host_sync predicate returns false), read_point never (since
     the hook is left null).

This test runs in `make check` alongside the other in-tree tests.
doc/docs/Backend_Hooks.md describes the hook contract: the table
shape, lifecycle ordering, opaque-state semantics, per-hook contracts,
suspension, and a minimal "transparent" backend example.

Kept intentionally minimal: ~100 lines.  A longer-form walkthrough
(real backend implementation, debugging, performance pitfalls) can be
written as a follow-up doc PR if there's interest.
Previous version aborted in CI.  Likely culprits removed:

  * Used `gv.surroundings()` -> `field_energy_in_box(volume)` chain;
    swapped for the no-arg `f.field_energy()` which goes through
    `user_volume.surroundings()` itself.
  * Used `f.use_real_fields()` (state coupling, can abort on Bloch).
  * Used `continuous_src_time` (energy can grow unboundedly).

New shape mirrors `tests/one_dimensional.cpp`: `volone(6.0, 10.0)`,
gaussian pulse via the explicit-args `add_point_source` overload,
100 timesteps, compare `field_energy()`.  Also relaxes the strict
step-count equality to `>=` so any internal step() calls don't
trip the assertion.
Without this, single-cell field queries (mode analysis, harminv,
point monitors) fall through to sync_host_if_needed, which triggers
a full-grid host sync per query when a backend keeps fields on
device.  With the hook installed, the read goes directly to the
backend's shadow storage without a sync.

Falls back to the host-array path when no read_point hook is
installed (i.e., bit-identical to the previous behavior for vanilla
meep and for backends that don't implement the hook).

Mirrors the existing wiring in dft_ldos.cpp.
Lifts the read_point + fallback pattern that was duplicated in
dft_ldos.cpp and monitor.cpp into a single inline helper:

  inline realnum read_field_at(const fields *f, const fields_chunk *fc,
                               component c, int cmp, ptrdiff_t idx);

If a backend has installed `read_point`, route through it; otherwise
read `fc->f[c][cmp][idx]` directly.  Caller still calls
`sync_host_if_needed` once before the loop when no point-read hook
is available.

Both sites collapse from a ternary-per-cell to a single function call.
No behavior change.
Adds an "MPI" section to doc/docs/Backend_Hooks.md spelling out the
implicit cross-rank contracts that backends must respect:

  1. step() return value must be collective (otherwise MPI deadlock
     in step_boundaries on the rank that took the CPU fallback).
  2. backend_suspended must be toggled collectively.
  3. init/cleanup run in collective contexts (safe to do MPI/NCCL
     setup/teardown there).
  4. read_point is local-only (call sites already check is_mine()).

Also notes that sync_to_host / sync_from_host are per-rank.

Doc-only change.
@stevengj
Copy link
Copy Markdown
Collaborator

stevengj commented May 1, 2026

This PR turns that recurring patch into a stable extension point so backends can ship as out-of-tree libraries pinned to specific meep versions.

I'm not sure the goal of Meep development is to make it easier to maintain private forks?

Note that you can't "ship" proprietary out-of-tree backends outside of your company due to the GPL, though of course you can use modified private forks in-house without distributing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants