fix!: Let writing data to file at flush#213
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #213 +/- ##
==========================================
+ Coverage 80.54% 85.07% +4.53%
==========================================
Files 188 123 -65
Lines 18179 12590 -5589
Branches 2174 1472 -702
==========================================
- Hits 14642 10711 -3931
+ Misses 2977 1548 -1429
+ Partials 560 331 -229 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Helveg
left a comment
There was a problem hiding this comment.
Looks good! Double check what's going on with multi-sim, and multi-block.
- Add a test with a 1 sim that writes 3 blocks, and verify data goes into correct blocks
- Add a test with 3 sims that each write to 2-3 blocks and also verify integrity
- Add a test that tries to write 10G to disk and check that memory stays flat.
Co-authored-by: Robin De Schepper <robin.deschepper93@gmail.com>
|
It is in preparation for the integration of the bsb-otel and related its branch. It is triggered for every PR push so do not worry if it does not pass the tests. |
Helveg
left a comment
There was a problem hiding this comment.
Good! I'd just like a bit clearer how strong the proof of the memory test is. Ideally we can run it with a debug flag to produce a graph or something so that we can now have some visual evidence and later in case of regressions easily can have a look.
…s and spiket methods
|
@Helveg now all the things discussed should be adressed |
Write each run to its own neo block with an auto-generated unique storage key, so re-running a simulation (or composing several) into one file appends blocks instead of overwriting same-named ones. Stamp each block with sim name, timestamp and run index. The streamed result no longer serves data from memory: .block raises and the ambiguous .spiketrains/.analogsignals accessors are removed (read the file back instead). write() copies the file in streamed mode. Tests: replace the misleading triple-sim test with an append test and a same-name overwrite regression, add a NEST composition test, migrate the remaining results.spiketrains/analogsignals callers, and drop the dead neuron TestCheckpoints class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the composition test to NEURON: NEST cannot compose two simulations in one adapter (its kernel locks the time resolution once nodes exist), so the composition+streaming path is only exercisable on NEURON, where it is already a supported path. Fix the same-name rerun test: the checkpoint controller keeps its step counter across run_simulation calls, so the second run does not reproduce the first run's segment count. Assert instead that the first run's block survives the rerun intact (no overwrite) and the second run writes its own block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The block property's return annotation made sphinx-build -nW fail with an unresolved py:class reference to neo.Block. Drop the annotation to match the original untyped attribute. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The spike controller initialised its step counter in __init__, so a sim object reused across run_simulation calls kept a stale counter and the second run produced no checkpoints. Reset it in implement(), which runs once per prepare, so every run re-checkpoints. The rerun test now asserts both blocks keep their full 11 segments. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NEST freezes the kernel time resolution once nodes are created, so setting it per-simulation in set_settings broke composition: the second sim's prepare raised "time representation cannot be changed". Set the resolution once, up front in simulate() before any prepare, and reject composing simulations with differing resolutions (NEST has a single kernel resolution). This lets NEST simulations be composed into one file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The NEURON composition test writes a result file via the streamed flush path, which is not MPI-safe (unguarded shared-file writes), so it must be skipped under parallel testing like every other checkpoint/result test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The skip was unjustified: the test passes under MPI (per-rank temp files, no shared-file collision), and the CI failure that prompted it was an unrelated flaky hang in bsb-core's MPI run, not this test. Keep the MPI coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
API surface rectifications (SimulationResult / streamed checkpoints)We reworked the streamed-result interface so the in-memory and streamed paths share one clear contract. Summary of what changed and why. Block identityPreviously both the neo block name and the NIX storage key were set to
Reading
Writing
Modes
NEST composition
Bugs fixed along the way
Tests
Open follow-up
|
The test runners have fewer cores than the `mpiexec -n 2` ranks, so OpenMPI's default busy-wait polling lets a rank spinning on an RMA/collective starve its peer and deadlock mpilock, intermittently hanging the parallel test run. Set OMPI_MCA_mpi_yield_when_idle=1 so idle ranks yield the CPU. Reproduced locally by pinning both ranks to one core: the pool-heavy tests hang without this flag and pass with it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Setting OMPI_MCA_mpi_yield_when_idle at the workflow step level leaked into examples:test, which runs serially without the parallel extra; bsb then saw the OMPI_* env var and raised "MPI runtime detected without parallel support". Pass the setting as `mpiexec --mca mpi_yield_when_idle 1` instead, so it applies only to the parallel test ranks (which have parallel support) and never to serial processes. Verified locally by pinning ranks to one core: the pool tests hang without it and pass with it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a one-line comment above each `mpiexec --mca mpi_yield_when_idle 1` command so the non-obvious flag is self-explanatory in the test target. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nx tolerates JSONC comments in project.json, but the Sphinx conf reads the same file with strict json.loads to get project metadata, so the // comment made every docs build (and all Read the Docs builds) fail. Remove the comments; the flag's rationale stays in the commit that added it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nx writes project.json as JSONC, but sphinxext-bsb read it with strict json.loads, so a // comment broke every docs build. Strip comments (while preserving string contents) before parsing. This lets the project.json test commands carry an explanatory comment for the mpi_yield_when_idle flag. Verified: bsb-core iso-docs builds locally with the comments present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Good, thanks. Just one the |
|
We can change this to a breaking change, I already have another PR targetting a new |
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Resolve packages/bsb-core/project.json: keep main's `-t .` discover root and #213's `--mca mpi_yield_when_idle 1` flag + comment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Ok, so we can restore the next branch aand merge to it you are ok. |
Describe the work done
Add a fix to the simulation checkpoint integration, since it is the expected behavior. Now when results are flushed they are written to file without storing memory. As discussed backward compatibility is preserved, the new behavior is triggered only when
run_simulationmethod is called specifyingoutput_filenameargument, this will be the default when CLI is used.Refer to issue: #50
closes #232
Tasks
📚 Documentation preview 📚: https://bsb-nest--213.org.readthedocs.build/en/213/
📚 Documentation preview 📚: https://bsb-neuron--213.org.readthedocs.build/en/213/
📚 Documentation preview 📚: https://bsb-core--213.org.readthedocs.build/en/213/
📚 Documentation preview 📚: https://bsb-arbor--213.org.readthedocs.build/en/213/