Add backend extension-point hooks for out-of-tree FDTD backends by Luochenghuang · Pull Request #3211 · NanoComp/meep

Luochenghuang · 2026-04-30T21:58:07Z

Summary

Adds a small process-global function-pointer table (POD layout, no virtual dispatch) that lets an external library, such as a CUDA, ROCm, or vectorized-CPU backend or an instrumentation shim, plug into meep's hot paths at load time without forking the codebase.

All hooks default to null. Null means "fall through to the in-tree CPU implementation." A stock build with no backend loaded behaves bit-identically to one without these hooks. This is asserted by a new test (tests/backend_hooks.cpp) that runs under make check.

What this PR adds

src/meep/backend_hooks.hpp: a flat function-pointer struct (meep::backend_hooks) plus two inline helpers (sync_host_if_needed, read_field_at). Hooks: init, cleanup, step, sync_to_host, sync_from_host, read_point, needs_host_sync.
src/backend_hooks.cpp: the process-global table, zero-initialized.
fields::backend_state and fields_chunk::backend_state: opaque void * slots for backend-owned per-sim and per-chunk state. Upstream meep never inspects them.
fields::backend_suspended: when true, the step hook is bypassed and the in-tree CPU step path runs instead. Used by solve_cw, which iterates against host arrays directly.
Hook call sites at the points where backends need to interpose: step dispatch (step.cpp), sim lifecycle (fields.cpp), generic readout dispatcher (loop_in_chunks.cpp), get_field (monitor.cpp), dft_norm (dft.cpp), dump/load (fields_dump.cpp), solve_cw (cw_fields.cpp), and the LDOS hot path (dft_ldos.cpp).
tests/backend_hooks.cpp: a counting "transparent" backend that implements every hook but defers all work to the CPU. Asserts (a) bit-identical numerical results between baseline and hooked runs, and (b) hooks fire at the expected sites with the expected counts. Runs in make check.
doc/docs/Backend_Hooks.md: contract reference covering hook signatures, lifecycle ordering, opaque-state semantics, suspension, MPI contracts (collective step() return, etc.), and a tiny skeleton-backend example.

Total invasive change to existing source files: ~80 lines, all 1 to 7 line insertions. The bulk of the PR is the new header, the test, and the doc.

What this PR does not change

Numerical results for any existing test or simulation. Every hook call is gated on a null-pointer check and runs no code when no backend is installed. The new test asserts this.
The public Python or C++ API.
The build system's defaults.
Performance for vanilla meep. Each call site adds at most a single null-pointer test, which the branch predictor handles perfectly. The hot paths inside dft_ldos.cpp and monitor.cpp::get_field use the inline read_field_at helper that compiles down to a direct array load when no backend is installed (with -O2+ the conditional resolves at compile time after inlining and loop-unswitching).

Why

Today, anyone who wants to add a non-CPU backend to meep has to fork the repository and patch the same ~30 lines in step.cpp, cw_fields.cpp, the readout sites, and the fields class declaration. That fork then needs to be rebased against upstream forever. This PR turns that recurring patch into a stable extension point so backends can ship as out-of-tree libraries pinned to specific meep versions.

The motivating use case is a multi-GPU CUDA/NCCL backend that today lives as a fork; with these hooks, it can move to a sibling repo with no patches against upstream meep.

Hook surface

namespace meep {
struct backend_hooks {
  void    (*init)           (fields *f);          // per-sim setup
  void    (*cleanup)        (fields *f);          // per-sim teardown
  bool    (*step)           (fields *f);          // one FDTD timestep; return true to skip CPU path
  void    (*sync_to_host)   (fields *f);          // device -> host arrays
  void    (*sync_from_host) (fields *f);          // host arrays -> device
  realnum (*read_point)     (const fields *f, const fields_chunk *fc,
                             component c, int cmp, ptrdiff_t idx);
  bool    (*needs_host_sync)(const fields *f);    // gate for sync_to_host
};
extern backend_hooks meep_backend;
}

Defaults all null. Backends populate at load time (typically via __attribute__((constructor))).

MPI

The hook table is process-global; each MPI rank has its own. Backends are responsible for any cross-rank coordination (NCCL, MPI, sum_to_all, etc.). Meep never synchronizes hook invocations across ranks.

The doc page spells out the implicit contracts in detail. The most important one: step()'s return value must be collective. If one rank returns true (skip CPU path) and another returns false, the latter will execute step_boundaries, which calls MPI_Sendrecv, while the former will not, and the run will deadlock. A backend that handles some configurations but not others must agree across ranks before returning, or simply always return true once installed.

ABI

No formal ABI versioning. Backends are expected to be built against a specific meep version, like Postgres extensions or out-of-tree kernel modules. Adding a new function pointer at the end of the struct is forward-compatible because the global is zero-initialized; reordering or removing fields breaks ABI for already-built backends.

Open question

Naming. meep_backend_* (current) vs meep::backend::* vs something else. Happy to rename.

Introduces a small C-ABI table of function pointers that an external backend library (e.g. CUDA, ROCm, vectorized-CPU) can install at load time to redirect hot paths through its own implementation. All hooks default to null; null means "fall through to the in-tree CPU code", so a stock build with no backend loaded behaves bit-identically to one without these hooks. This commit lands the foundation: * src/meep/backend_hooks.hpp — flat C-ABI hook table + helpers * src/backend_hooks.cpp — the process-global table (zeroed) * src/meep.hpp — `void *backend_state` slot on `fields` and `fields_chunk`, opaque storage owned by the backend * src/fields.cpp — call `meep_backend.init` at end of `fields` ctor (primary + copy) and `meep_backend.cleanup` at start of `~fields`, while chunks are still alive * src/step.cpp — step-dispatch hook: if a backend returns true from `step(this)`, the CPU step path is skipped Follow-up commit will add `sync_host_if_needed` calls at the CPU readout sites (DFT, monitors, integration, dump/load, CW solver) so backends never have to patch upstream files.

Now that backends can install hooks (foundation commit), call them at each upstream code path that reads or writes the canonical host arrays. No-op when no backend is loaded. * src/loop_in_chunks.cpp — sync_host_if_needed at the entry of fields::loop_in_chunks (covers integrate, energy/flux, DFT add/output chunkloops, and every other loop-in-chunks consumer) * src/monitor.cpp — sync_host_if_needed at the chunk-iterating leaf of fields::get_field; the other get_field overloads funnel through here * src/dft.cpp — sync_host_if_needed at fields::dft_norm (reads DFT accumulators directly) * src/fields_dump.cpp — sync_host_if_needed before fields::dump writes HDF5; sync_from_host after fields::load reads HDF5 back in * src/cw_fields.cpp — sync_host_if_needed at the start of fields::solve_cw; sync_from_host at the end so normal stepping resumes against the converged solution * src/step.cpp — guard the step-dispatch hook with !doing_solve_cw so the backend stays suspended during CW iterations After this commit, an external backend that implements `init`, `cleanup`, `step`, `sync_to_host`, `sync_from_host`, `is_active`, and `host_is_stale` should be able to run a full meep simulation without patching any upstream files.

The previous version's `if (!doing_solve_cw && meep_backend.step && ...)` assumed `doing_solve_cw` was visible from `fields::step()`. On clean upstream meep that field is private, so the build failed with: step.cpp: error: 'doing_solve_cw' was not declared in this scope Move the suppression to the caller: `solve_cw` saves and nulls `meep_backend.step` for the duration of its CG loop, then restores it before the final sync_from_host. Same effect, no coupling to upstream private state.

…redicate) Changes the hook table shape based on review notes: * Add `read_point` hook for fast single-cell field reads. Avoids per-step full-grid downloads on the LDOS / point-monitor hot path. Fall back to sync_to_host + direct array read when the hook is null. * Add `bool fields::backend_suspended` flag. When true, the step hook is bypassed and the in-tree CPU step path runs instead. Replaces the global save/null/restore of `meep_backend.step` that solve_cw previously did, which was fragile under any concurrent access to the backend table. * Collapse `is_active` and `host_is_stale` into a single `needs_host_sync` predicate. A backend that's not active returns false; a backend whose host arrays are already in sync returns false; otherwise true. Simpler call-site logic, smaller struct. * Rework header dependency direction. `meep/backend_hooks.hpp` now includes `meep.hpp` (rather than the other way round), which lets the hook signature use real meep types (`realnum`, `component`, `fields`, `fields_chunk`) without typedef gymnastics. The 8 source files that use a hook now `#include "meep/backend_hooks.hpp"` directly. * `dft_ldos.cpp` is wired through the new `read_point` hook with a fallback to direct array reads when no backend implements it. No behavior change for vanilla meep: every hook still defaults to null, every call site is still a null-pointer test.

Adds tests/backend_hooks.cpp -- a counting "transparent" backend that implements every hook but defers all real work to the CPU path. The test asserts: 1. Numerical results are bit-identical between a baseline run (no backend) and one with the transparent backend installed. This is the load-bearing claim for backend authors: the hook surface itself doesn't perturb values. 2. Hooks fire at the expected sites: `init` exactly once on construction, `cleanup` exactly once on destruction, `step` once per fields::step() call, sync_to_host never (since the needs_host_sync predicate returns false), read_point never (since the hook is left null). This test runs in `make check` alongside the other in-tree tests.

doc/docs/Backend_Hooks.md describes the hook contract: the table shape, lifecycle ordering, opaque-state semantics, per-hook contracts, suspension, and a minimal "transparent" backend example. Kept intentionally minimal: ~100 lines. A longer-form walkthrough (real backend implementation, debugging, performance pitfalls) can be written as a follow-up doc PR if there's interest.

Previous version aborted in CI. Likely culprits removed: * Used `gv.surroundings()` -> `field_energy_in_box(volume)` chain; swapped for the no-arg `f.field_energy()` which goes through `user_volume.surroundings()` itself. * Used `f.use_real_fields()` (state coupling, can abort on Bloch). * Used `continuous_src_time` (energy can grow unboundedly). New shape mirrors `tests/one_dimensional.cpp`: `volone(6.0, 10.0)`, gaussian pulse via the explicit-args `add_point_source` overload, 100 timesteps, compare `field_energy()`. Also relaxes the strict step-count equality to `>=` so any internal step() calls don't trip the assertion.

Without this, single-cell field queries (mode analysis, harminv, point monitors) fall through to sync_host_if_needed, which triggers a full-grid host sync per query when a backend keeps fields on device. With the hook installed, the read goes directly to the backend's shadow storage without a sync. Falls back to the host-array path when no read_point hook is installed (i.e., bit-identical to the previous behavior for vanilla meep and for backends that don't implement the hook). Mirrors the existing wiring in dft_ldos.cpp.

Lifts the read_point + fallback pattern that was duplicated in dft_ldos.cpp and monitor.cpp into a single inline helper: inline realnum read_field_at(const fields *f, const fields_chunk *fc, component c, int cmp, ptrdiff_t idx); If a backend has installed `read_point`, route through it; otherwise read `fc->f[c][cmp][idx]` directly. Caller still calls `sync_host_if_needed` once before the loop when no point-read hook is available. Both sites collapse from a ternary-per-cell to a single function call. No behavior change.

Adds an "MPI" section to doc/docs/Backend_Hooks.md spelling out the implicit cross-rank contracts that backends must respect: 1. step() return value must be collective (otherwise MPI deadlock in step_boundaries on the rank that took the CPU fallback). 2. backend_suspended must be toggled collectively. 3. init/cleanup run in collective contexts (safe to do MPI/NCCL setup/teardown there). 4. read_point is local-only (call sites already check is_mine()). Also notes that sync_to_host / sync_from_host are per-rank. Doc-only change.

stevengj · 2026-05-01T12:17:28Z

This PR turns that recurring patch into a stable extension point so backends can ship as out-of-tree libraries pinned to specific meep versions.

I'm not sure the goal of Meep development is to make it easier to maintain private forks?

Note that you can't "ship" proprietary out-of-tree backends outside of your company due to the GPL, though of course you can use modified private forks in-house without distributing.

Luochenghuang and others added 13 commits April 29, 2026 14:02

Merge branch 'NanoComp:master' into backend-hooks

e57f4a6

backend-hooks: clang-format fixups

a5b1e34

backend-hooks: clang-format the test file

6d5783d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add backend extension-point hooks for out-of-tree FDTD backends#3211

Add backend extension-point hooks for out-of-tree FDTD backends#3211
Luochenghuang wants to merge 13 commits intoNanoComp:masterfrom
Luochenghuang:backend-hooks

Luochenghuang commented Apr 30, 2026

Uh oh!

stevengj commented May 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Luochenghuang commented Apr 30, 2026

Summary

What this PR adds

What this PR does not change

Why

Hook surface

MPI

ABI

Open question

Uh oh!

stevengj commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stevengj commented May 1, 2026 •

edited

Loading