feat!: provenance for reconstruction and simulation files (#225)#236
feat!: provenance for reconstruction and simulation files (#225)#236Helveg wants to merge 14 commits into
Conversation
Reconstruction files (HDF5 and FS engines) now carry a root-level provenance bundle: a permanent storage_id (UUID4), a monotonic state_id revision counter, timestamps, the bsb-core/engine versions, a plugin manifest, host info and mpi_size. Placement and connectivity sets gain per-set revision/timestamps (and morphology_hashes for placement); the file store records content_sha256 and a producer per file. Legacy files without a bundle are backfilled on first write. Simulation result (.nio) files annotate the Block with a bsb_provenance dict that back-references the reconstruction (storage_id + state_id), the simulator and its version, the plugin manifest, timing, seed and host. Each recorder's Neo objects follow a documented bsb_* annotation convention identifying the source cell (ps_name, cell_id, cell_model) and device. Adds read_nio / iter_recordings helpers. BREAKING CHANGE: recorder output annotations changed from the ad-hoc device/senders/cell_type/cell_id keys to the bsb_* convention, and the storage root gains a provenance schema. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The FS engine's _bump_state and legacy-upgrade wrote metadata.json without the engine lock, so concurrent MPI ranks raced on the tmp+os.replace and could clobber each other (and double-upgrade a legacy root). Take the write lock for the read-modify-write, and re-check inside the lock during upgrade so only the first rank stamps the bundle. Mark the three single-rank provenance tests skip_parallel: they assert behaviour that only holds on one rank (the upgrade warning is emitted on the main rank only, and two use rank-local temp paths / exact direct bump counts). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
One first note to self is that level 7 currently seems to assume that everything is a cell, need a set of standardized attributes for synapses, compartments, (other?) as well |
|
Here are my first thoughts on this proposal (I did not check the code implementation yet): Regarding reconstruction files:
Regarding simulation results files:
|
|
I'll edit my post and will mark some attributes as "possibly helpful but not so important" like
|
…et kind
Split the recorder convention into a baseline every recorder shares
(bsb_device_name/kind, bsb_target_kind, bsb_simulation_id/segment_id) and a
target-kind layer selected by bsb_target_kind ("cell", "compartment",
"synapse", "lfp", ...). Per-kind fields are now first-class flat bsb_*
annotations (e.g. bsb_section, bsb_arc, bsb_synapse_type) instead of a nested
bsb_location dict. Built-in recorders emit cell (NEST/Arbor spikes, multimeter),
compartment (NEURON voltage/current clamp) and synapse (NEURON synapse) kinds.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…pse recordings
Replace the NEURON-flavoured section/segment location fields with the BSB-native
morphology address: bsb_branch, bsb_point, bsb_arc, taken straight from the
recorder's location accessor (loc.location -> (branch, point), loc.arc()). Reserve
a proposed bsb_coordinates {x, y, z, r} dict for the resolved point position, which
is also the per-segment geometry an LFP probe consumes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extend SimulationRecorder with a device_name attribute and a meta(property) method, both queryable at runtime before anything is written to file. Every built-in recorder now passes device=self to create_recorder, and a recorder can carry metadata (e.g. an LFP source geometry). This lets a controller find the recorders of the devices it manages and inspect their metadata during a flush, the missing piece for LFP-style probes (see #50). Also migrate sinusoidal_poisson_generator to the bsb_* convention and tag both Poisson generators with the "stimulus" target kind so the baseline holds. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rding_kind, drop modified_at) Apply the agreed thread decisions on #236: - rename the root-metadata `bsb_version` -> `bsb_core_version` (explicit; packages and plugins are already in the plugin manifest) - drop the `modified_at` timestamps from the root bundle and from per-PlacementSet / per-ConnectivitySet attrs; `state_id` / `revision` already signal change and `created_at` is kept - rename the recorder discriminator `bsb_target_kind` -> `bsb_recording_kind` (annotates a recording; avoids confusion with `bsb_device_kind`) - document `host` / `mpi_size` (and the result file's `scaffold.root`) as optional, diagnostic, best-effort last-known values, since reconstructions can be partial Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-225 # Conflicts: # packages/bsb-core/bsb/__init__.py # packages/bsb-core/bsb/storage/fs/file_store.py
- cache the plugin manifest so repeated engine create() stays within the storage interface test timeout - emit one spiketrain per targeted cell in the NEST and Arbor spike recorders so population size stays recoverable, and update the simulation tests to the bsb_* annotation convention - resolve ruff SIM105/I001/E501 findings surfaced by the merge - reference neo classes via their neo.core.* targets so the bsb-core docs build clean under -nW Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The classmap entry is stored on the dynamic root, not on the leaf class, so reading self.__class__.classmap_entry crashed the poisson and sinusoidal generators. Reverse-look it up in _device_kind and add a stimulus_train helper so both generators share the baseline annotation path instead of building the SpikeTrain by hand. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## v8 #236 +/- ##
=====================================
Coverage ? 84.00%
=====================================
Files ? 132
Lines ? 14332
Branches ? 1677
=====================================
Hits ? 12039
Misses ? 1890
Partials ? 403 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
A device already exposes the classmap entry it was configured under on its dynamic attribute (`self.device`), so read that directly instead of reverse-looking-up the dynamic root's classmap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@drodarie ready for review and for another round of feedback; especially L7 and L8 have been changed |
… writer Drop the separate _atomic_write_json and route the provenance bundle through _atomic_write_bytes (staged outside the discovery dir + os.replace) so the engine keeps a single, reviewed race-safe write path instead of a parallel implementation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ests The provenance tests opened storage in a per-rank tempfile.TemporaryDirectory while the engine broadcasts rank-0's root, so whichever rank left its `with` block first removed the shared directory out from under the others, flaking `test_scaffold_exposes_storage_id_state_id_provenance` under mpiexec with an empty provenance bundle. Route the parallel tests through RandomStorageFixture, which derives an MPI-safe root and cleans up collectively in tearDownClass; the single-rank @skip_parallel tests keep their own tempdir. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
nx may write project.json with // or /* */ comments, which json.loads rejects, breaking the monorepo docs conf.py that reads doc dependencies (surfacing as a failed bsb-otel Read the Docs build). Parse it as JSONC. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…json" This reverts commit 9783250.
|
I believe |
|
Another important point, we should provide an utility script to help users to update their current reconstruction and simulation files to the new format. Otherwise, these would not update. |
Status: first proposal, feedback wanted
This is a first proposal for end-to-end provenance, addressing #225. The set of attributes below is a starting point, not a finished spec; the
bsb_schema_versionfields exist precisely so we can evolve the layout. Please comment on missing attributes, naming, or anything that should be reshaped.Notation: attributes in
[brackets]are optional / diagnostic, best-effort, and hold the last known value (useful for sanity-checking, not load-bearing).Revised per review feedback (thread above)
bsb_version->bsb_core_version(explicit; every package and plugin is also inplugins).modified_attimestamps (not worth a write per mutation;state_id/revisionalready signal change,created_atis kept).host,mpi_size, and the result file'sscaffold.rootare now bracketed optional/last-known, since reconstructions can be built in parts (redo/append).bsb_recording_kindso cells, compartments, synapses, LFP, ... each declare their own required fields (Level 8), instead of assuming every recording is a whole cell. (Named for the recording it annotates, distinct frombsb_device_kind.)What this adds
Two artefacts gain provenance: the reconstruction file (the compiled network, written by a storage engine) and the simulation result file (the
.nioNeo container). A result file back-references the reconstruction it was run against, so a recording can be traced to the exact network state, the software that produced it, and the cell it came from.Full developer documentation lives under For Developers -> Interfaces -> Storage engines and Simulating Networks -> Simulation results.
Level 1 - Reconstruction: root metadata
Written on engine
create();state_idbumps on every mutating write. Exposed read-only asscaffold.storage_id,scaffold.state_id,scaffold.provenance.storage_idstate_idbsb_schema_versioncreated_atbsb_core_versionbsb-coreversion at creation (all packages/plugins are also inplugins).engine_name"hdf5"or"fs".engine_versionplugins{category: {entry_name: {package, version}}}over all plugin categories.[host]{platform, python_version, hostname, user, cwd}of the last writer. Diagnostic.[mpi_size]comm.get_size()of the last writer. Diagnostic.Level 2 - Reconstruction: per PlacementSet
revisioncreated_atmorphology_hashes(Alongside the existing
len,morphology_loaders,labelsets,chunksattributes.)Level 3 - Reconstruction: per ConnectivitySet
revisioncreated_atLevel 4 - Reconstruction: file store meta (per file)
content_sha256producer{package, version}of whoever stored the file.Level 5 - Simulation result: Block annotation
bsb_provenanceschema_versionsimulation_idsimulation_namestarted_at/finished_atwall_secondsseedduration_ms/resolution_msscaffold{storage_id, state_id, [root]}(root= best-effort last-known absolute path of the reconstruction file).pluginssimulator{name, version, extra}(e.g. NEST modules loaded).[host]{platform, python_version, hostname, user, cwd}. Diagnostic.[mpi_size]Level 6 - Simulation result: per Segment annotations
segment_idcheckpoint_indext_start_ms/t_stop_mssimulator_stateLevel 7 - Simulation result: per recorded object, baseline (every recorder)
The convention is documented, not enforced; a recorder may emit any number of objects. Every recorded Neo object carries this baseline, regardless of what it records. What is recorded uses Neo's native
name/units.bsb_device_namebsb_device_kindclassmap_entry(e.g.spike_recorder,multimeter).bsb_recording_kindcell,compartment,synapse,lfp,stimulus, ... Selects the Level 8 fields.bsb_simulation_idsimulation_id.bsb_segment_idsegment_id.Level 8 - Simulation result: per recorded object, recording-kind extension
On top of the baseline, each
bsb_recording_kindadds first-class flatbsb_*fields (siblings of the baseline keys) that locate its target, using BSB-native morphology addressing (branch / point / arc), never simulator-internal names. Open to feedback / new kinds.bsb_recording_kindcellbsb_ps_name,bsb_cell_id,bsb_cell_modelcompartmentcellfields +bsb_branch,bsb_point,bsb_arc(+ proposedbsb_coordinates{x,y,z,r})synapsecellfields +bsb_branch,bsb_point,bsb_arc,bsb_synapse_type+ presynaptic identity (proposed:bsb_pre_ps_name,bsb_pre_cell_id)lfpbsb_probe,bsb_position)stimulusbsb_target_countBuilt-in recorders: NEST
spike_recorder/multimeterand Arborspike_recorder->cell; NEURONvoltage_recorder/current_clamp->compartment; NEURONsynapse_recorder->synapse; NESTpoisson_generator/sinusoidal_poisson_generator->stimulus. Thelfpkind has no built-in recorder yet. The proposedbsb_coordinatesis also the per-segment geometry an LFP probe needs.Recorder interface (runtime inspection)
To support controller-style devices (e.g. an LFP probe per #50), a
SimulationRecorderis inspectable at runtime, before anything is written to file:recorder.device_namelinks it back to the device that created it (every built-in recorder passesdevice=self).recorder.meta(property)exposes recorder-level metadata (e.g.recorder.meta("lfp_source_geometry")).Combined with the per-object
bsb_device_nameannotation, this lets a controller find the recorders of the devices it manages and query their metadata during a flush. The remaining piece for a functioning LFP probe, per-checkpoint flushing of results, is tracked separately in #50.Reader helper
from bsb import read_nio, iter_recordingsflattens a result file intoRecordingrecords; filter byps_name,cell_id,device, recorded quantity, or anybsb_*annotation key (e.g.bsb_recording_kind,bsb_branch).Compatibility
BsbProvenanceUpgradeWarning); read-only opens leavestorage_id/state_idasNone.device/senders/cell_type/cell_idkeys to the layeredbsb_*convention.Test plan
bsb-hdf5andbsb-coreprovenance unit tests (root attrs round-trip, state bumping, legacy auto-upgrade, FSmetadata.jsonmigration, Scaffold API, Block/Segment/recorder annotations, baseline + recording-kind layering, recorderdevice_name/meta()runtime inspection,iter_recordingsfiltering)mpiexec -n 2(FS provenance writes locked; single-rank-only assertions markedskip_parallel)bsb-core/bsb-hdf5suites still greencheck-apipasses; full docs build passes with zero warningsbsb_version->bsb_core_version, droppedmodified_at,bsb_target_kind->bsb_recording_kind, bracketedhost/mpi_size/root🤖 Generated with Claude Code
📚 Documentation preview 📚: https://bsb-nest--236.org.readthedocs.build/en/236/
📚 Documentation preview 📚: https://bsb-hdf5--236.org.readthedocs.build/en/236/
📚 Documentation preview 📚: https://bsb-arbor--236.org.readthedocs.build/en/236/
📚 Documentation preview 📚: https://bsb--236.org.readthedocs.build/en/236/
📚 Documentation preview 📚: https://bsb-core--236.org.readthedocs.build/en/236/
📚 Documentation preview 📚: https://bsb-neuron--236.org.readthedocs.build/en/236/