Skip to content

fix!: Let writing data to file at flush#213

Open
filimarc wants to merge 42 commits into
nextfrom
fix/checkpoint_flush
Open

fix!: Let writing data to file at flush#213
filimarc wants to merge 42 commits into
nextfrom
fix/checkpoint_flush

Conversation

@filimarc
Copy link
Copy Markdown
Contributor

@filimarc filimarc commented Jan 27, 2026

Describe the work done

Add a fix to the simulation checkpoint integration, since it is the expected behavior. Now when results are flushed they are written to file without storing memory. As discussed backward compatibility is preserved, the new behavior is triggered only when run_simulation method is called specifying output_filename argument, this will be the default when CLI is used.

Refer to issue: #50
closes #232

Tasks


📚 Documentation preview 📚: https://bsb-nest--213.org.readthedocs.build/en/213/


📚 Documentation preview 📚: https://bsb-neuron--213.org.readthedocs.build/en/213/


📚 Documentation preview 📚: https://bsb-core--213.org.readthedocs.build/en/213/


📚 Documentation preview 📚: https://bsb-arbor--213.org.readthedocs.build/en/213/

@github-actions github-actions Bot added the fix label Jan 27, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 27, 2026

Codecov Report

❌ Patch coverage is 87.03704% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.07%. Comparing base (80ffbd5) to head (5ad6d53).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
packages/bsb-core/bsb/simulation/results.py 79.31% 5 Missing and 1 partial ⚠️
packages/bsb-nest/bsb_nest/adapter.py 90.90% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #213      +/-   ##
==========================================
+ Coverage   80.54%   85.07%   +4.53%     
==========================================
  Files         188      123      -65     
  Lines       18179    12590    -5589     
  Branches     2174     1472     -702     
==========================================
- Hits        14642    10711    -3931     
+ Misses       2977     1548    -1429     
+ Partials      560      331     -229     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@filimarc filimarc requested a review from Helveg January 28, 2026 09:20
Copy link
Copy Markdown
Contributor

@Helveg Helveg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Double check what's going on with multi-sim, and multi-block.

  • Add a test with a 1 sim that writes 3 blocks, and verify data goes into correct blocks
  • Add a test with 3 sims that each write to 2-3 blocks and also verify integrity
  • Add a test that tries to write 10G to disk and check that memory stays flat.

Comment thread packages/bsb-core/bsb/cli/commands/_commands.py Outdated
Comment thread packages/bsb-core/bsb/simulation/results.py Outdated
@filimarc filimarc requested a review from Helveg April 17, 2026 10:22
@filimarc
Copy link
Copy Markdown
Contributor Author

I've add the tests suggested, for three sim on the same file and to check the flushing with bsb-neuron.

@Helveg @drodarie why is it compiling bsb-otel readthedocs if is not in this branch?

@drodarie
Copy link
Copy Markdown
Contributor

It is in preparation for the integration of the bsb-otel and related its branch. It is triggered for every PR push so do not worry if it does not pass the tests.

Copy link
Copy Markdown
Contributor

@Helveg Helveg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good! I'd just like a bit clearer how strong the proof of the memory test is. Ideally we can run it with a debug flag to produce a graph or something so that we can now have some visual evidence and later in case of regressions easily can have a look.

Comment thread packages/bsb-core/bsb/simulation/results.py Outdated
Comment thread packages/bsb-neuron/tests/test_neuron.py Outdated
@filimarc
Copy link
Copy Markdown
Contributor Author

@Helveg now all the things discussed should be adressed

@drodarie drodarie requested a review from Helveg May 15, 2026 13:16
Helveg and others added 7 commits May 28, 2026 19:50
Write each run to its own neo block with an auto-generated unique storage
key, so re-running a simulation (or composing several) into one file
appends blocks instead of overwriting same-named ones. Stamp each block
with sim name, timestamp and run index.

The streamed result no longer serves data from memory: .block raises and
the ambiguous .spiketrains/.analogsignals accessors are removed (read the
file back instead). write() copies the file in streamed mode.

Tests: replace the misleading triple-sim test with an append test and a
same-name overwrite regression, add a NEST composition test, migrate the
remaining results.spiketrains/analogsignals callers, and drop the dead
neuron TestCheckpoints class.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the composition test to NEURON: NEST cannot compose two simulations
in one adapter (its kernel locks the time resolution once nodes exist), so
the composition+streaming path is only exercisable on NEURON, where it is
already a supported path.

Fix the same-name rerun test: the checkpoint controller keeps its step
counter across run_simulation calls, so the second run does not reproduce
the first run's segment count. Assert instead that the first run's block
survives the rerun intact (no overwrite) and the second run writes its own
block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The block property's return annotation made sphinx-build -nW fail with an
unresolved py:class reference to neo.Block. Drop the annotation to match
the original untyped attribute.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The spike controller initialised its step counter in __init__, so a sim
object reused across run_simulation calls kept a stale counter and the
second run produced no checkpoints. Reset it in implement(), which runs
once per prepare, so every run re-checkpoints. The rerun test now asserts
both blocks keep their full 11 segments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NEST freezes the kernel time resolution once nodes are created, so setting
it per-simulation in set_settings broke composition: the second sim's
prepare raised "time representation cannot be changed". Set the resolution
once, up front in simulate() before any prepare, and reject composing
simulations with differing resolutions (NEST has a single kernel
resolution). This lets NEST simulations be composed into one file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The NEURON composition test writes a result file via the streamed flush
path, which is not MPI-safe (unguarded shared-file writes), so it must be
skipped under parallel testing like every other checkpoint/result test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The skip was unjustified: the test passes under MPI (per-rank temp files,
no shared-file collision), and the CI failure that prompted it was an
unrelated flaky hang in bsb-core's MPI run, not this test. Keep the MPI
coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Helveg
Copy link
Copy Markdown
Contributor

Helveg commented May 29, 2026

API surface rectifications (SimulationResult / streamed checkpoints)

We reworked the streamed-result interface so the in-memory and streamed paths share one clear contract. Summary of what changed and why.

Block identity

Previously both the neo block name and the NIX storage key were set to simulation.name (nix_name=simulation.name). neo's write_block deletes and recreates any block whose nix_name already exists, so re-running a simulation (or running two same-named sims) into one file silently overwrote the earlier block.

  • Storage key is now auto-generated and unique per run (we no longer pass nix_name).
  • The block label stays simulation.name (human-readable, may repeat across runs).
  • Each block stamps run metadata (sim name, timestamp, run index).
  • Writing is append-only: re-running into the same file adds a new block, never overwrites.
  • flush() addresses its block by the unique key, not by a name match.

Reading

  • .block is now a property: returns the in-memory block, and in streamed mode raises with a clear message pointing at the file (instead of returning an inert object).
  • Removed the top-level .spiketrains / .analogsignals accessors. In streamed mode they silently returned [], and they only ever exposed segments[0] (the first checkpoint), which is misleading once checkpointing splits a run across segments. Data is read per segment (segment.spiketrains / segment.analogsignals); a proper results-reading utility is left as a follow-up.

Writing

  • .write(dest) writes the in-memory block as before; in streamed mode it copies the file (previously it was a silent no-op).

Modes

  • Two modes, fixed at construction: in-memory (no output_filename) and streamed (output_filename given). The CLI always streams.

NEST composition

  • NEST freezes the kernel time resolution once nodes are created, so setting it per-simulation in set_settings broke composing multiple simulations into one adapter (the second prepare raised "time representation cannot be changed"). Resolution is now set once, up front in simulate() before any neurons are created, and composing simulations with differing resolutions is rejected (NEST has a single kernel resolution). NEST simulations can now be composed into one file.

Bugs fixed along the way

  • Same-name rerun overwrite (see Block identity).
  • Checkpoint controllers kept their step counter across run_simulation calls, so a reused simulation object did not re-checkpoint. The counter now resets per run.

Tests

  • Replaced the misleading triple-sim test (name said "triple", docstring said "two", and it only used distinct names so it never exercised the overwrite path) with an append test and a same-name overwrite regression.
  • Added NEST and NEURON composition tests.
  • Migrated the remaining results.spiketrains / results.analogsignals callers to result.block.segments[0]....

Open follow-up

  • Streamed checkpoint writing is not yet rank-aware, so its behaviour under MPI (rank-0-gated write after a gather, or parallel I/O) needs a dedicated design pass. This matters for HPC runs and is worth a separate issue.

Helveg and others added 5 commits May 29, 2026 12:01
The test runners have fewer cores than the `mpiexec -n 2` ranks, so
OpenMPI's default busy-wait polling lets a rank spinning on an RMA/collective
starve its peer and deadlock mpilock, intermittently hanging the parallel
test run. Set OMPI_MCA_mpi_yield_when_idle=1 so idle ranks yield the CPU.

Reproduced locally by pinning both ranks to one core: the pool-heavy tests
hang without this flag and pass with it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Setting OMPI_MCA_mpi_yield_when_idle at the workflow step level leaked into
examples:test, which runs serially without the parallel extra; bsb then saw
the OMPI_* env var and raised "MPI runtime detected without parallel
support". Pass the setting as `mpiexec --mca mpi_yield_when_idle 1` instead,
so it applies only to the parallel test ranks (which have parallel support)
and never to serial processes. Verified locally by pinning ranks to one core:
the pool tests hang without it and pass with it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a one-line comment above each `mpiexec --mca mpi_yield_when_idle 1`
command so the non-obvious flag is self-explanatory in the test target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nx tolerates JSONC comments in project.json, but the Sphinx conf reads the
same file with strict json.loads to get project metadata, so the // comment
made every docs build (and all Read the Docs builds) fail. Remove the
comments; the flag's rationale stays in the commit that added it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nx writes project.json as JSONC, but sphinxext-bsb read it with strict
json.loads, so a // comment broke every docs build. Strip comments (while
preserving string contents) before parsing. This lets the project.json
test commands carry an explanatory comment for the mpi_yield_when_idle flag.

Verified: bsb-core iso-docs builds locally with the comments present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@filimarc
Copy link
Copy Markdown
Contributor Author

Good, thanks. Just one the .spiketrains`.analogsignals` interfaces where left to be fully backward compatible. I don't know who is using this things, but maybe we can avoid to flag it as breaking change.

@Helveg
Copy link
Copy Markdown
Contributor

Helveg commented May 29, 2026

We can change this to a breaking change, I already have another PR targetting a new v8 branch that is related. The changes required to the simulation results etc in prep of the rest of the LFP will most likely be breaking too.

Helveg and others added 2 commits May 29, 2026 22:23
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Resolve packages/bsb-core/project.json: keep main's `-t .` discover root
and #213's `--mca mpi_yield_when_idle 1` flag + comment.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@filimarc
Copy link
Copy Markdown
Contributor Author

Ok, so we can restore the next branch aand merge to it you are ok.

@filimarc filimarc changed the title fix: Let writing data to file at flush fix!: Let writing data to file at flush May 30, 2026
@filimarc filimarc changed the base branch from main to next May 30, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bsb-nest is not compatible with NEST 3.10+

3 participants