ci: green up modular-viscy-staging (CI matrix, deps, lint)#435
Conversation
All workspace packages require python>=3.12 but the CI matrix included 3.11, causing uv sync to fail with 'incompatible with the project's Python requirement: >=3.12' and fail-fast cancelling all other matrix jobs. Also extend push/pull_request triggers to modular-viscy-staging so PRs targeting the 0.5 staging branch run lint/test.
- packages/viscy-data tests import viscy_transforms (test_combined.py, test_hcs.py, test_hcs_weighted_crop.py) but it was not declared in the test group, causing ModuleNotFoundError in Test Data Extras CI. - applications/dynacell/src/dynacell/evaluation/pipeline.py:9 imports hydra unconditionally. hydra-core was only present in the 'eval' and 'report' optional extras, so 'uv sync --group test' (used by Test dynacell benchmark configs CI) failed at collection with ModuleNotFoundError: No module named 'hydra'. Promote hydra-core to a base dependency since it's needed at import time.
…st to tests/ The file is a Jupyter-style demo script (# %% cells) with a hardcoded absolute home path /home/eduardo.hirata/... It is meant to be invoked via 'uv run python ...', not collected by pytest, but its test_ prefix caused 'Test (dynaclr)' CI to fail with FileNotFoundError on the missing parquet at collection time. - Rename to demo_2d_mip_augmentation.py so pytest does not collect it. - Add testpaths=['tests'] under [tool.pytest.ini_options] in dynaclr's pyproject.toml so future demo scripts under scripts/ are not picked up when CI invokes pytest from applications/dynaclr/.
Auto-fixes from prek/pre-commit hooks: - ruff format: distance.py, lca.py, normalize.py, precompute.py, conftest.py (5 files reformatted, no semantic change) - end-of-file-fixer: append trailing newline to .envrc Manual fixes: - E741: rename ambiguous 'O' to 'Ov' (overlap) in celldiff_wrapper.py sliding-window patcher. - D100: add module docstrings to plot_dim_reduct.py, tau_sampling.py, channel_dropout.py. - D102: add forward() docstring in ChannelDropout. - NPY002: replace np.random.randn() with np.random.default_rng() in test_mp_utils.py. The renamed demo_2d_mip_augmentation.py is in this commit because ruff format touched its imports as part of the same auto-pass.
There was a problem hiding this comment.
Pull request overview
Restores green CI for the modular-viscy-staging branch by aligning the CI Python matrix with the repo’s requires-python>=3.12, fixing missing runtime/test dependencies, preventing pytest from collecting demo scripts, and applying lint/format cleanups.
Changes:
- Update CI workflows to run on
modular-viscy-stagingand drop Python 3.11 from the test matrix. - Fix dependency gaps (
viscy-transformsinviscy-datatest group;hydra-coreas a base dep fordynacell) and refreshuv.lock. - Prevent accidental pytest collection of demo scripts in
dynaclr, plus ruff/format and small reproducibility tweaks in tests.
Reviewed changes
Copilot reviewed 16 out of 18 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
uv.lock |
Lockfile regeneration reflecting updated dependency declarations. |
packages/viscy-utils/tests/test_mp_utils.py |
Makes the test deterministic via a seeded NumPy Generator API. |
packages/viscy-utils/src/viscy_utils/precompute.py |
Ruff-format cleanup (no functional intent apparent). |
packages/viscy-utils/src/viscy_utils/normalize.py |
Ruff-format cleanup (no functional intent apparent). |
packages/viscy-utils/src/viscy_utils/evaluation/lca.py |
Minor print formatting refactor for reports. |
packages/viscy-utils/src/viscy_utils/evaluation/distance.py |
Ruff-format cleanup (no functional intent apparent). |
packages/viscy-data/src/viscy_data/channel_dropout.py |
Adds module/method docstrings to satisfy docstring linting. |
packages/viscy-data/pyproject.toml |
Adds viscy-transforms to the test dependency group. |
applications/qc/tests/conftest.py |
Ruff-format cleanup in test fixture generation. |
applications/dynaclr/src/dynaclr/data/tau_sampling.py |
Adds module docstring to satisfy linting. |
applications/dynaclr/scripts/plotting/plot_dim_reduct.py |
Adds module docstring to satisfy linting. |
applications/dynaclr/scripts/dataloader_inspection/demo_2d_mip_augmentation.py |
Renames demo script and updates usage text to avoid pytest collection. |
applications/dynaclr/pyproject.toml |
Configures pytest testpaths to scope collection to tests/. |
applications/dynacell/src/dynacell/celldiff_wrapper.py |
Renames ambiguous loop variable (O → Ov) and minor formatting. |
applications/dynacell/pyproject.toml |
Promotes hydra-core to a base dependency to match unconditional import usage. |
.github/workflows/test.yml |
Drops Python 3.11 from matrix; runs CI for modular-viscy-staging. |
.github/workflows/lint.yml |
Runs lint CI for modular-viscy-staging. |
.envrc |
Adds trailing newline / fixes EOF formatting. |
Comments suppressed due to low confidence (1)
applications/dynacell/pyproject.toml:50
hydra-coreis now a base dependency, but it’s still listed inoptional-dependencies.evalandoptional-dependencies.report. Consider removing it from those extras to avoid redundant declarations and keep the meaning of the extras clear (then regenerateuv.lock).
dependencies = [
"hydra-core>=1.2",
"lightning>=2.3",
"monai",
"omegaconf",
"pydantic>=2",
"viscy-data[mmap]",
"viscy-models[celldiff]",
"viscy-transforms",
"viscy-utils",
]
optional-dependencies.eval = [
"accelerate>=1.13",
"aicsmlsegment",
"aicssegmentation",
"cellpose",
"cubic==0.7.0a2",
"dynaclr",
"hydra-core>=1.2",
"iohub",
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Resolve conflicts from the base-branch ruff/style commit (0f63d97) overlapping the CI green-up changes. All conflicts were cosmetic (docstring wording, rng inlining); kept the PR-side versions. Fixed an auto-merge artifact in plot_dim_reduct.py that stacked two module docstrings and pushed an import below code (E402). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… e2e test from_cell_index returns (ExperimentRegistry, DataFrame); the fixture assigned the tuple to a bare `registry` and passed it to MultiExperimentIndex, which raised `AttributeError: 'tuple' object has no attribute 'experiments'` at setup of every TestFlatParquetRegistry / TestChannelModes / TestSamplerGroupings test. Unpack and discard the DataFrame (the fixture builds its own). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nfigs celldiff's predict__ipsc_confocal.yml was split into per-method variants (__denoise, __iterative, __sliding_window), but test_predict_leaf_composes still expected the single file for every model in PREDICT_LEAVES, raising FileNotFoundError for all four celldiff organelles. Add PREDICT_LEAF_FILES parametrizing celldiff's three variants plus unetvit3d's single file, and switch the predict-composition test to it. PREDICT_LEAVES is retained for the eval symlink test, whose eval__ipsc_confocal.yaml is unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
iohub auto-enables the Rust `zarrs` codec pipeline whenever `zarrs` is importable (it ships as a transitive dep of iohub 0.3.x). `zarrs` 0.2.3 builds a rank-6 read offset for rank-5 sharded zarr-v3 arrays, raising `RuntimeError: incompatible offset [0,0,0,0,0,0] for region with start [0,0,0,0,0]` on every sharded v3 read, and deadlocks spawned DDP ranks. iohub re-applies its codec choice on every store open, so a one-time `zarr.config.set(...)` is clobbered by the next `open_ome_zarr`. Override iohub's ZarrConfig field default instead, applied on import of viscy_data (the lowest zarr-reading package; everything else depends on it) so the native `BatchedCodecPipeline` is in place before any store is opened. Fixes the zarr_v3 failures in test-data / test-data-extras and the test_combined_ddp hang. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
torch_fidelity is in dynacell's [eval] extra, but feature_metrics.py imported its metric helpers at module top. The minimal CI test job (test group only, no eval extras) imports the eval pipeline via test_evaluation_grouped.py, so collection failed with `ModuleNotFoundError: No module named 'torch_fidelity'` at pipeline.py:18 -> feature_metrics.py:18. Move the four torch_fidelity imports into the _fid/_kid/_bootstrap_prc/ _mind functions that use them. The cache-only grouped parity test short-circuits before any metric runs, so the eval pipeline now imports without torch_fidelity, matching the CI job's documented minimal-env intent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
configs/training/fit.yml and multi_experiment_fit.yml were removed in the config reorg (ecf7c08 "refactor training configs into composable _base/arch/data layers"), but two tests still asserted they exist and that their class_paths resolve, failing with "Config file not found". Point them at the current full demo configs: demo/demo_2d_fit.yml and demo/demo_bag_of_channels_v3_fit.yml. The predict.yml parametrization is unchanged (it still exists). All resolved class_paths import cleanly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…0|h200) test_4gpu_train_leaves_inherit_a100_exclude asserted constraint 'h100|h200|a40|a6000|l40s', but hardware_4gpu.yml sets 'h100|h200' and its prose explains why: 4-GPU FCMAE/UNeXt2 train at large patches where a single DDP rank needs 30-50 GB, so the 48 GB A40/A6000/L40S cards have no headroom. The config is authoritative; the test constant was inconsistent (both landed together in #404). Update the expected constraint and the test's rationale comment to match. The file's override-hint (opt in to smaller cards) is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
test_mmap_preload_multi_process_sharing defined its child target as a nested closure, which fails under the spawn start method (Windows/macOS default) with `AttributeError: Can't get local object`. fork (Linux) inherits the function so it passed there. Move the child to module level as _mmap_sharing_child and pass the data path explicitly instead of closing over the fixture. Verified passing under a forced spawn context. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… leaves The config layout was refactored but test_benchmark_config_composition.py still asserted the old layout, failing once collection-blocking imports were fixed. Reconcile the tests against the current configs: - test_a549_predict_leaf_composes: predict__a549_mantis.yml was split per treatment condition (_mock/_denv/_zikv). Generate expectations over gene-info x model x condition; assert the condition-specific data_path, dataset_ref, and experiment_id each composed leaf actually produces. - test_migrated_target_predict_resolves_to_test_store: celldiff's predict__ipsc_confocal.yml was split per inference method; point celldiff at the sliding_window representative (all three resolve the same store). - test_joint_train_leaf_composes / _smoke_: the joint train leaf was redesigned to single-GPU H200 (strategy auto, devices 1, bare-string channels) and the a549 child path moved to mantis_v1/train/SEC61B_all.zarr. Update assertions to match; topology still satisfies the SLURM invariant. Configs unchanged. Full file: 165 passed, 52 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Making the child target module-level was necessary but not sufficient: under the spawn start method (Windows default) the child re-imports the test module, which pytest loaded via --import-mode=importlib, failing with ModuleNotFoundError: No module named 'packages'. Use an explicit fork context (child inherits the loaded module, no re-import) and skip on Windows where fork is unavailable -- the same pattern test_combined_ddp.py already uses for this exact issue. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Engine's isinstance check targets viscy_models' NTXentLoss
subclass, so importing pml's base class fell through to the
triplet branch and passed a 2-D projection as the 1-D label
arg ("labels must be a 1D tensor").
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
All-channels mode is undefined when experiments carry different channel sets: norm_meta collapses to one channel per cell after dedup, and _scatter_channels zips the marker union positionally against a single experiment's patch. Bag-of-channels (channels_per_sample=1) is the supported path — one channel per sample keyed channel_0. - _C 2 -> 1, channel_dropout_channels [1] -> [0] - drop bogus collection_path kwarg from the parquet case Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # applications/dynaclr/tests/test_multi_experiment_integration.py
…549 any-GPU test) (#446) * feat(dynacell/eval): add run_for_group cross-condition probe entry point pipeline.py imports run_for_group from cross_condition_probe to write a per-condition mock-vs-infected probe CSV into each condition's eval dir, but the function was never committed. As a result dynacell.evaluation.pipeline was unimportable in a clean checkout (ImportError: cannot import name 'run_for_group'), so the grouped and parallel eval tests failed at collection in CI. Add run_for_group (and the _write_rows helper) matching the pipeline call site: it runs the existing pairwise probe per infected condition against mock and writes one CSV per eval dir, colocated with the dir the reporting layer already resolves per (model, pool, organelle, condition). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(dynacell): unpin predict from H200 via any-GPU launcher profile Predict jobs were pinned to H200 by every leaf composing hardware_h200_single.yml. Predict at FP32 fits in ~7 GB and is compute-bound (measured 6.6 GB / 100% SM / 15-20% bandwidth on celldiff a549-trained membrane mock, H200), so Hopper HBM/bandwidth is unused — A40 / A6000 / L40S / L4 all run it and drain the queue faster. Add hardware_predict_any_gpu.yml (same resource shape as hardware_h200_single, constraint=null) and swap all 233 tracked predict leaves across celldiff, fcmae_vscyto3d_pretrained, fcmae_vscyto3d_scratch, fnet3d_paper, pix2pix3d_unetvit, and unetvit3d to use it. Train leaves stay on hardware_h200_single / hardware_4gpu (memory + bandwidth profile differs). README updated. test_a549_predict_leaf_composes now asserts sbatch.constraint is None (any GPU) instead of "h200". This is the config-side fix for the 24 a549 composition failures #435 left open by re-pinning predicts to H200. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(dynacell/eval): tidy run_for_group (reuse _DEFAULT_PAIRS, drop dead param) Post-review cleanups, no behavior change: - Iterate _DEFAULT_PAIRS instead of hardcoding ("denv","zikv") + building ("mock", cond) by hand — keeps the mock-vs-infected pair set in one place, so a future infected condition is picked up automatically. - Drop the unused filename parameter (no caller passes it) and write to GROUP_PROBE_FILENAME directly. - Drop the needless list() around the _FIELDNAMES tuple in _write_rows (csv.DictWriter accepts any sequence). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(dynacell): assert predict constraint key present-and-null test_a549_predict_leaf_composes used .get("constraint") is None, which also passes if the hardware profile is dropped from the leaf's base chain (missing key -> None). Subscript the key so a dropped any-GPU profile fails loudly instead of silently — the any-GPU profile composes an explicit constraint=null, so the key is always present when applied. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Alexandr Kalinin <alxndrkalinin@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… cross-condition probe hardening (#447) * perf(dynacell/eval): memoize embedding loads across cross-condition pairs _probe_pair re-read each NPZ from disk on every call, so the shared mock reference side was loaded once per pair: run_for_group read the 8 mock files (4 features x 2 sources) 16 times, and run re-read each condition once per pair. Thread an optional per-call dict cache keyed on the NPZ path through _probe_pair -> _load_embeddings; run / run_for_group create one and pass it so each file is read once per group. Measured on a synthetic 3-condition group: run_for_group drops from 32 to 24 np.load reads (~26 MB of redundant mock NPZ I/O per group avoided). The cache is local to each call and released on return, so memory stays bounded to one group's embeddings rather than accumulating across groups (which a module-level lru_cache would do). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(dynacell/eval): raise on duplicate condition in run_for_group run_for_group built its condition->dir map with a last-wins assignment, so two dirs mapping to the same condition (e.g. two *_mock dirs) silently overwrote each other and produced a probe against an arbitrary dir with no signal. run() already raises ValueError on duplicates; mirror that here so an ambiguous group surfaces as an error (the grouped pipeline catches it and safely skips the probe). Unrecognized-token dirs are still skipped, as documented. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(dynacell/eval): lazily import cubic in instance_metrics instance_metrics imported cubic.metrics.average_precision at module top level, and pipeline_cache imports DEFAULT_IOU_THRESHOLDS from it — so the whole pipeline import chain pulled the GPU-only cubic stack. The minimal `test-dynacell-configs` CI job (uv sync --group test, no eval extra) then failed to import dynacell.evaluation.pipeline with `ModuleNotFoundError: No module named 'cubic'`, taking down every grouped / parallel / parallel_cpu eval test at collection. Move the cubic import into instance_average_precision (its only use), so importing the module for the threshold constant no longer needs cubic. The AP computation still hard-requires cubic and fails loudly there. Mirrors the lazy torch_fidelity imports in feature_metrics. Verified: dynacell.evaluation.pipeline now imports with cubic unavailable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Alexandr Kalinin <alxndrkalinin@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 21 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
packages/viscy-data/tests/test_hcs.py:1010
- After
proc.join(timeout=30), the code immediately callsresult_queue.get_nowait(). If the child process is still running (hung) or the queue write is slightly delayed, this raises and makes the test flaky (and can also leave a runaway child). Add anis_alive()guard and use a boundedget(timeout=...).
proc.start()
proc.join(timeout=30)
status, value = result_queue.get_nowait()
assert status == "ok", f"Child process failed: {value}"
The test-dynacell-configs job installed only base deps + the test group, so the eval-pipeline tests (which import dynacell.evaluation.pipeline) failed with `ModuleNotFoundError: No module named 'cubic'` once the import chain reached instance_metrics / segmentation. cubic is a *hard* dependency of the eval pipeline (evaluation/CLAUDE.md: do not hide its imports), and it is CPU-capable — it falls back to numpy/scikit-image without CUDA — so the right fix is to install it, not to lazily defer its import. - CI: `uv sync --frozen --extra eval --group test` (CPU; eval_gpu cupy/cucim stays out — those are the only CUDA-only deps). - Revert the lazy `from cubic.metrics import average_precision` in instance_metrics back to a module-level import (restores the convention; the deferral added earlier is no longer needed now that cubic is present). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
torch-fidelity is a hard dependency of the eval extra (and the test-dynacell-configs CI job now installs that extra), so there is no reason to defer its imports into _fid/_kid/_mind/_bootstrap_prc. Move fid/kid/mind/prc helper imports back to the module top, matching the convention that eval deps are imported eagerly and fail loud if absent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…back _require_cubic (in metrics.py and io.py) raised "<pkg> is required for GPU-backed metrics … uv sync --extra eval_gpu" whenever cupy/cucim were absent — even on the CPU path. That defeated cubic's own numpy/scikit-image fallback: compute_pixel_metrics and _cp_raw_regionprops already gate the GPU upload on `torch.cuda.is_available()` and use asnumpy/numpy otherwise, where cubic dispatches to CPU. The eager gate fired before that code ran, so a GPU-less runner with the eval extra (but no eval_gpu) — e.g. GitHub CI — failed instead of falling back. Drop the cupy/cucim requirement from both _require_cubic; keep the cubic presence check. On an actual GPU run without cupy, `ascupy` still raises a clear "GPU requested but not available". Verified: the eval-pipeline + grouped tests pass with the GPU hidden (CUDA_VISIBLE_DEVICES="") across the serial and spawn-process paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_load_embeddings keyed its memoization cache on the unresolved eval_dir/embeddings/<f>.npz path while the docstring promised the *resolved* path — a doc/code mismatch that could also defeat caching when the same file is reached via different relative paths or symlinks. Resolve the path (strict=False, so a missing file still surfaces as np.load's FileNotFoundError) before the cache lookup and load. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code |
alxndrkalinin
left a comment
There was a problem hiding this comment.
Looks good to me!
There was a problem hiding this comment.
@alxndrkalinin I think this is good. any straggling iohub related things otherwise?
|
@srivarra not that I can think of |
Summary
Restore green CI on
modular-viscy-stagingso PR #373 (the 0.5 merge tomain) can land. Four atomic commits, each fixing one distinct failure class observed on the #373 CI run.What's broken on
modular-viscy-stagingtodayAfter pushing the merge commit to
modular-viscy-staging, PR #373 ran CI for the first time and surfaced five pre-existing failure classes (none caused by the merge itself):uv syncerrors on Python 3.11requires-python = ">=3.12". CI matrix still includes 3.11 →fail-fast: truecancels every 3.12/3.13 sibling.ModuleNotFoundError: No module named 'viscy_transforms'packages/viscy-data/tests/test_combined.py(+ 2 others) importviscy_transforms, but thetestgroup does not declare it.ModuleNotFoundError: No module named 'hydra'applications/dynacell/src/dynacell/evaluation/pipeline.py:9importshydraunconditionally, buthydra-coreis gated behind theeval/reportextras. CI installs withuv sync --group testonly.FileNotFoundError: /home/eduardo.hirata/.../test_2d_mip_mixed.parquetapplications/dynaclr/scripts/dataloader_inspection/test_2d_mip_augmentation.pyis a Jupyter-style demo (# %%cells, hard-coded/home/eduardo.hirata/...path) named with thetest_prefix, so pytest collects and crashes on it.ruff format,.envrcmissing trailing newline, 1 E741, 3 D100, 1 D102, 1 NPY002.Commits in this PR
ci: drop Python 3.11 from matrix and trigger on modular-viscy-staging—.github/workflows/test.yml,.github/workflows/lint.yml. Removes3.11from the matrix, addsmodular-viscy-stagingto push/pull_request triggers so future PRs targeting the 0.5 staging branch run lint/test.fix(deps): declare viscy-transforms test dep and hydra-core for dynacell— Promoteshydra-core>=1.2to dynacell's basedependencies(it's imported unconditionally), addsviscy-transformstoviscy-data's test group. Regenerateduv.lock.fix(dynaclr): rename test_2d_mip_augmentation.py to demo_; scope pytest to tests/— Renames the demo script todemo_2d_mip_augmentation.py, addstestpaths = ["tests"]under[tool.pytest.ini_options]inapplications/dynaclr/pyproject.tomlso future demo scripts underscripts/are not collected when CI invokes pytest fromapplications/dynaclr/.style: satisfy pre-commit (ruff format, EOF, D100/D102, E741, NPY002)— Auto-fixes fromuvx prek run --all-filesplus the 7 manual lint fixes.Test plan
Test*jobs).