Skip to content

Support Zarr and OME-Zarr inputs in the track CLI#2

Merged
JoOkuma merged 2 commits into
mainfrom
improving-cli-io
Jun 15, 2026
Merged

Support Zarr and OME-Zarr inputs in the track CLI#2
JoOkuma merged 2 commits into
mainfrom
improving-cli-io

Conversation

@JoOkuma

@JoOkuma JoOkuma commented Jun 15, 2026

Copy link
Copy Markdown
Member

Adds Zarr and OME-Zarr support to the hoct track image loader, which previously only read single files / folders of frames via the optional bioio extra.

hoct._io.load_array now reads three input families with always-available deps:

  • Single TIFF (via tifffile)
  • Simple Zarr arrays
  • OME-Zarr (t, c, z, y, x) groups — highest-res level, first channel kept, length-1 axes collapsed

Zarr stores and folders of TIFFs load lazily as dask arrays so large datasets aren't fully materialized: Zarr via dask.array.from_zarr (collapse runs on the lazy array), TIFF folders via dask.array.image.imread (one frame per chunk, shapes validated from headers). create_graph already accepts dask arrays, so these flow end to end. Other single-file formats fall back to bioio.

tifffile/zarr are promoted to explicit dependencies. The layout check uses is_frame_folder so a .zarr store is treated as a single input, not a frame folder.

New unit tests cover each loader path and assert laziness; an end-to-end track run exercises OME-Zarr and TIFF-folder inputs. Full suite (125 tests) passes.

JoOkuma added 2 commits June 15, 2026 13:20
The track CLI loader only handled single files and folders of frames via
the optional bioio extra. Broaden hoct._io.load_array to read three input
families with always-available dependencies:

- Single TIFF, read with tifffile (first channel, length-1 axes collapsed).
- Simple Zarr arrays, with length-1 axes collapsed.
- OME-Zarr (t, c, z, y, x) multiscale groups: highest-resolution level,
  first channel kept, length-1 axes collapsed (c=1/z=1 -> (T, Y, X)).

bioio is now only a fallback for other single-file formats. The track
command's layout check uses is_frame_folder so a .zarr store (a directory)
is treated as a single input rather than a frame folder.

tifffile and zarr are promoted to explicit dependencies since they are now
imported directly. Adds unit tests for the new loader paths and an
end-to-end track test on OME-Zarr inputs.
Avoid materializing whole datasets in memory:

- Zarr arrays and OME-Zarr levels are read with dask.array.from_zarr; the
  channel selection and length-1 axis collapse run on the lazy array (using
  indexing and .squeeze, since dask arrays have no .take).
- Folders of single-suffix TIFFs are stacked lazily with
  dask.array.image.imread (one frame per chunk), with per-frame shape
  validation done from TIFF headers (no pixel reads). Non-TIFF / mixed-suffix
  folders keep the eager fallback.

create_graph already accepts dask arrays, so these flow end to end. Adds
laziness assertions to the loader tests.
@JoOkuma JoOkuma merged commit cabe8fd into main Jun 15, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant