Support Zarr and OME-Zarr inputs in the track CLI#2
Merged
Conversation
The track CLI loader only handled single files and folders of frames via the optional bioio extra. Broaden hoct._io.load_array to read three input families with always-available dependencies: - Single TIFF, read with tifffile (first channel, length-1 axes collapsed). - Simple Zarr arrays, with length-1 axes collapsed. - OME-Zarr (t, c, z, y, x) multiscale groups: highest-resolution level, first channel kept, length-1 axes collapsed (c=1/z=1 -> (T, Y, X)). bioio is now only a fallback for other single-file formats. The track command's layout check uses is_frame_folder so a .zarr store (a directory) is treated as a single input rather than a frame folder. tifffile and zarr are promoted to explicit dependencies since they are now imported directly. Adds unit tests for the new loader paths and an end-to-end track test on OME-Zarr inputs.
Avoid materializing whole datasets in memory: - Zarr arrays and OME-Zarr levels are read with dask.array.from_zarr; the channel selection and length-1 axis collapse run on the lazy array (using indexing and .squeeze, since dask arrays have no .take). - Folders of single-suffix TIFFs are stacked lazily with dask.array.image.imread (one frame per chunk), with per-frame shape validation done from TIFF headers (no pixel reads). Non-TIFF / mixed-suffix folders keep the eager fallback. create_graph already accepts dask arrays, so these flow end to end. Adds laziness assertions to the loader tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds Zarr and OME-Zarr support to the
hoct trackimage loader, which previously only read single files / folders of frames via the optionalbioioextra.hoct._io.load_arraynow reads three input families with always-available deps:tifffile)(t, c, z, y, x)groups — highest-res level, first channel kept, length-1 axes collapsedZarr stores and folders of TIFFs load lazily as dask arrays so large datasets aren't fully materialized: Zarr via
dask.array.from_zarr(collapse runs on the lazy array), TIFF folders viadask.array.image.imread(one frame per chunk, shapes validated from headers).create_graphalready accepts dask arrays, so these flow end to end. Other single-file formats fall back tobioio.tifffile/zarrare promoted to explicit dependencies. The layout check usesis_frame_folderso a.zarrstore is treated as a single input, not a frame folder.New unit tests cover each loader path and assert laziness; an end-to-end
trackrun exercises OME-Zarr and TIFF-folder inputs. Full suite (125 tests) passes.