Add backend extension-point hooks for out-of-tree FDTD backends#3211
Draft
Luochenghuang wants to merge 13 commits intoNanoComp:masterfrom
Draft
Add backend extension-point hooks for out-of-tree FDTD backends#3211Luochenghuang wants to merge 13 commits intoNanoComp:masterfrom
Luochenghuang wants to merge 13 commits intoNanoComp:masterfrom
Conversation
Introduces a small C-ABI table of function pointers that an external
backend library (e.g. CUDA, ROCm, vectorized-CPU) can install at load
time to redirect hot paths through its own implementation. All hooks
default to null; null means "fall through to the in-tree CPU code", so
a stock build with no backend loaded behaves bit-identically to one
without these hooks.
This commit lands the foundation:
* src/meep/backend_hooks.hpp — flat C-ABI hook table + helpers
* src/backend_hooks.cpp — the process-global table (zeroed)
* src/meep.hpp — `void *backend_state` slot on
`fields` and `fields_chunk`, opaque
storage owned by the backend
* src/fields.cpp — call `meep_backend.init` at end of
`fields` ctor (primary + copy) and
`meep_backend.cleanup` at start of
`~fields`, while chunks are still alive
* src/step.cpp — step-dispatch hook: if a backend
returns true from `step(this)`, the
CPU step path is skipped
Follow-up commit will add `sync_host_if_needed` calls at the CPU
readout sites (DFT, monitors, integration, dump/load, CW solver) so
backends never have to patch upstream files.
Now that backends can install hooks (foundation commit), call them at
each upstream code path that reads or writes the canonical host
arrays. No-op when no backend is loaded.
* src/loop_in_chunks.cpp — sync_host_if_needed at the entry of
fields::loop_in_chunks (covers integrate,
energy/flux, DFT add/output chunkloops,
and every other loop-in-chunks consumer)
* src/monitor.cpp — sync_host_if_needed at the chunk-iterating
leaf of fields::get_field; the other
get_field overloads funnel through here
* src/dft.cpp — sync_host_if_needed at fields::dft_norm
(reads DFT accumulators directly)
* src/fields_dump.cpp — sync_host_if_needed before fields::dump
writes HDF5; sync_from_host after
fields::load reads HDF5 back in
* src/cw_fields.cpp — sync_host_if_needed at the start of
fields::solve_cw; sync_from_host at the
end so normal stepping resumes against
the converged solution
* src/step.cpp — guard the step-dispatch hook with
!doing_solve_cw so the backend stays
suspended during CW iterations
After this commit, an external backend that implements
`init`, `cleanup`, `step`, `sync_to_host`, `sync_from_host`,
`is_active`, and `host_is_stale` should be able to run a full meep
simulation without patching any upstream files.
The previous version's `if (!doing_solve_cw && meep_backend.step && ...)` assumed `doing_solve_cw` was visible from `fields::step()`. On clean upstream meep that field is private, so the build failed with: step.cpp: error: 'doing_solve_cw' was not declared in this scope Move the suppression to the caller: `solve_cw` saves and nulls `meep_backend.step` for the duration of its CG loop, then restores it before the final sync_from_host. Same effect, no coupling to upstream private state.
…redicate)
Changes the hook table shape based on review notes:
* Add `read_point` hook for fast single-cell field reads. Avoids
per-step full-grid downloads on the LDOS / point-monitor hot path.
Fall back to sync_to_host + direct array read when the hook is null.
* Add `bool fields::backend_suspended` flag. When true, the step hook
is bypassed and the in-tree CPU step path runs instead. Replaces
the global save/null/restore of `meep_backend.step` that solve_cw
previously did, which was fragile under any concurrent access to
the backend table.
* Collapse `is_active` and `host_is_stale` into a single
`needs_host_sync` predicate. A backend that's not active returns
false; a backend whose host arrays are already in sync returns
false; otherwise true. Simpler call-site logic, smaller struct.
* Rework header dependency direction. `meep/backend_hooks.hpp` now
includes `meep.hpp` (rather than the other way round), which lets
the hook signature use real meep types (`realnum`, `component`,
`fields`, `fields_chunk`) without typedef gymnastics. The 8 source
files that use a hook now `#include "meep/backend_hooks.hpp"`
directly.
* `dft_ldos.cpp` is wired through the new `read_point` hook with a
fallback to direct array reads when no backend implements it.
No behavior change for vanilla meep: every hook still defaults to null,
every call site is still a null-pointer test.
Adds tests/backend_hooks.cpp -- a counting "transparent" backend that
implements every hook but defers all real work to the CPU path. The
test asserts:
1. Numerical results are bit-identical between a baseline run (no
backend) and one with the transparent backend installed. This is
the load-bearing claim for backend authors: the hook surface
itself doesn't perturb values.
2. Hooks fire at the expected sites: `init` exactly once on
construction, `cleanup` exactly once on destruction, `step` once
per fields::step() call, sync_to_host never (since the
needs_host_sync predicate returns false), read_point never (since
the hook is left null).
This test runs in `make check` alongside the other in-tree tests.
doc/docs/Backend_Hooks.md describes the hook contract: the table shape, lifecycle ordering, opaque-state semantics, per-hook contracts, suspension, and a minimal "transparent" backend example. Kept intentionally minimal: ~100 lines. A longer-form walkthrough (real backend implementation, debugging, performance pitfalls) can be written as a follow-up doc PR if there's interest.
Previous version aborted in CI. Likely culprits removed:
* Used `gv.surroundings()` -> `field_energy_in_box(volume)` chain;
swapped for the no-arg `f.field_energy()` which goes through
`user_volume.surroundings()` itself.
* Used `f.use_real_fields()` (state coupling, can abort on Bloch).
* Used `continuous_src_time` (energy can grow unboundedly).
New shape mirrors `tests/one_dimensional.cpp`: `volone(6.0, 10.0)`,
gaussian pulse via the explicit-args `add_point_source` overload,
100 timesteps, compare `field_energy()`. Also relaxes the strict
step-count equality to `>=` so any internal step() calls don't
trip the assertion.
Without this, single-cell field queries (mode analysis, harminv, point monitors) fall through to sync_host_if_needed, which triggers a full-grid host sync per query when a backend keeps fields on device. With the hook installed, the read goes directly to the backend's shadow storage without a sync. Falls back to the host-array path when no read_point hook is installed (i.e., bit-identical to the previous behavior for vanilla meep and for backends that don't implement the hook). Mirrors the existing wiring in dft_ldos.cpp.
Lifts the read_point + fallback pattern that was duplicated in
dft_ldos.cpp and monitor.cpp into a single inline helper:
inline realnum read_field_at(const fields *f, const fields_chunk *fc,
component c, int cmp, ptrdiff_t idx);
If a backend has installed `read_point`, route through it; otherwise
read `fc->f[c][cmp][idx]` directly. Caller still calls
`sync_host_if_needed` once before the loop when no point-read hook
is available.
Both sites collapse from a ternary-per-cell to a single function call.
No behavior change.
Adds an "MPI" section to doc/docs/Backend_Hooks.md spelling out the
implicit cross-rank contracts that backends must respect:
1. step() return value must be collective (otherwise MPI deadlock
in step_boundaries on the rank that took the CPU fallback).
2. backend_suspended must be toggled collectively.
3. init/cleanup run in collective contexts (safe to do MPI/NCCL
setup/teardown there).
4. read_point is local-only (call sites already check is_mine()).
Also notes that sync_to_host / sync_from_host are per-rank.
Doc-only change.
Collaborator
I'm not sure the goal of Meep development is to make it easier to maintain private forks? Note that you can't "ship" proprietary out-of-tree backends outside of your company due to the GPL, though of course you can use modified private forks in-house without distributing. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a small process-global function-pointer table (POD layout, no virtual dispatch) that lets an external library, such as a CUDA, ROCm, or vectorized-CPU backend or an instrumentation shim, plug into meep's hot paths at load time without forking the codebase.
All hooks default to null. Null means "fall through to the in-tree CPU implementation." A stock build with no backend loaded behaves bit-identically to one without these hooks. This is asserted by a new test (
tests/backend_hooks.cpp) that runs undermake check.What this PR adds
src/meep/backend_hooks.hpp: a flat function-pointer struct (meep::backend_hooks) plus two inline helpers (sync_host_if_needed,read_field_at). Hooks:init,cleanup,step,sync_to_host,sync_from_host,read_point,needs_host_sync.src/backend_hooks.cpp: the process-global table, zero-initialized.fields::backend_stateandfields_chunk::backend_state: opaquevoid *slots for backend-owned per-sim and per-chunk state. Upstream meep never inspects them.fields::backend_suspended: when true, the step hook is bypassed and the in-tree CPU step path runs instead. Used bysolve_cw, which iterates against host arrays directly.step.cpp), sim lifecycle (fields.cpp), generic readout dispatcher (loop_in_chunks.cpp),get_field(monitor.cpp),dft_norm(dft.cpp),dump/load(fields_dump.cpp),solve_cw(cw_fields.cpp), and the LDOS hot path (dft_ldos.cpp).tests/backend_hooks.cpp: a counting "transparent" backend that implements every hook but defers all work to the CPU. Asserts (a) bit-identical numerical results between baseline and hooked runs, and (b) hooks fire at the expected sites with the expected counts. Runs inmake check.doc/docs/Backend_Hooks.md: contract reference covering hook signatures, lifecycle ordering, opaque-state semantics, suspension, MPI contracts (collectivestep()return, etc.), and a tiny skeleton-backend example.Total invasive change to existing source files: ~80 lines, all 1 to 7 line insertions. The bulk of the PR is the new header, the test, and the doc.
What this PR does not change
dft_ldos.cppandmonitor.cpp::get_fielduse the inlineread_field_athelper that compiles down to a direct array load when no backend is installed (with-O2+ the conditional resolves at compile time after inlining and loop-unswitching).Why
Today, anyone who wants to add a non-CPU backend to meep has to fork the repository and patch the same ~30 lines in
step.cpp,cw_fields.cpp, the readout sites, and thefieldsclass declaration. That fork then needs to be rebased against upstream forever. This PR turns that recurring patch into a stable extension point so backends can ship as out-of-tree libraries pinned to specific meep versions.The motivating use case is a multi-GPU CUDA/NCCL backend that today lives as a fork; with these hooks, it can move to a sibling repo with no patches against upstream meep.
Hook surface
Defaults all null. Backends populate at load time (typically via
__attribute__((constructor))).MPI
The hook table is process-global; each MPI rank has its own. Backends are responsible for any cross-rank coordination (NCCL, MPI,
sum_to_all, etc.). Meep never synchronizes hook invocations across ranks.The doc page spells out the implicit contracts in detail. The most important one:
step()'s return value must be collective. If one rank returnstrue(skip CPU path) and another returnsfalse, the latter will executestep_boundaries, which callsMPI_Sendrecv, while the former will not, and the run will deadlock. A backend that handles some configurations but not others must agree across ranks before returning, or simply always returntrueonce installed.ABI
No formal ABI versioning. Backends are expected to be built against a specific meep version, like Postgres extensions or out-of-tree kernel modules. Adding a new function pointer at the end of the struct is forward-compatible because the global is zero-initialized; reordering or removing fields breaks ABI for already-built backends.
Open question
Naming.
meep_backend_*(current) vsmeep::backend::*vs something else. Happy to rename.