Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
24c79f0
Initial implementation of round trip testing between Java and Python
cmalinmayor Dec 4, 2025
ad440c5
Add note on how to run tests in readme
cmalinmayor Dec 4, 2025
55b3891
Update logback version to handle CVE warnings
ksugar Apr 10, 2026
034d7d8
v1 spec compatibility implementation
ksugar Apr 10, 2026
ca37ef5
Update v1 spec compatibility analysis and plan
ksugar Apr 10, 2026
d1345ea
Implement Variable-length Property
ksugar Apr 10, 2026
46c30c0
update v1 spec compatibility plan for writing varlen props
ksugar Apr 10, 2026
e515270
Implement VarlengthPropertyWriting
ksugar Apr 10, 2026
8f1512d
Fallback to RawCompression if Blosc compression is not available
ksugar May 8, 2026
109beff
Add uv-related files for testing
ksugar May 8, 2026
26c4f23
Refactoring
ksugar May 8, 2026
3ff6e48
Fix five bugs found during code review
ksugar May 8, 2026
8600fef
Fix covariance2d/3d not restored on read, add roundtrip tests
ksugar May 8, 2026
b21271e
Add covariance2d/3d cross-language round-trip test
ksugar May 8, 2026
adb210d
Unify Java indentation to tabs (1 tab per indent level)
ksugar May 8, 2026
563a911
Fix zarr output to pass is_geff_dataset validation
ksugar May 8, 2026
91ad337
Populate props metadata when writeAllProps=true to pass Python valida…
ksugar May 8, 2026
f9cb478
Write zarr arrays in little-endian byte order for Python/pandas compa…
ksugar May 8, 2026
f2a0cc8
Fix byte-order patch for blosc-compressed Zarr arrays
ksugar May 8, 2026
cf811bc
Automatically calculate the chunk size
ksugar Jun 19, 2026
82faad8
Update docs
ksugar Jun 19, 2026
995a53a
Update README
ksugar Jun 19, 2026
0eef18d
Update CITATION.cff
ksugar Jun 19, 2026
0f3366b
Refactoring
ksugar Jun 19, 2026
77dc5a9
Update polygon handling to follow the v1 spec
ksugar Jun 22, 2026
11a3d23
Add supported Zarr format
ksugar Jun 22, 2026
d02d64c
Add methods for custom props and varlenProps
ksugar Jun 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,7 @@ buildNumber.properties
.mvn/wrapper/maven-wrapper.jar
/logs/
.vscode/settings.json
.idea/

# Cross-language test data
cross-language-tests/data/*.zarr/
16 changes: 11 additions & 5 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,23 @@ title: geff Java
message: If you use this software, please cite it as below.
type: software
authors:
- family-names: Ko
given-names: Sugawara
- family-names: Sugawara
given-names: Ko
orcid: https://orcid.org/0000-0002-1392-9340
- family-names: Tinevez
given-names: Jean-Yves
orcid: https://orcid.org/0000-0002-0998-4718
date-released: 2025-07-17
version: 0.1.0
- family-names: Pietzsch
given-names: Tobias
orcid: https://orcid.org/0000-0002-9477-3957
- family-names: Malin-Mayor
given-names: Caroline
orcid: https://orcid.org/0000-0002-9627-6030
date-released: 2026-06-19
version: 1.0.0
identifiers:
- description: All versions of this software
type: doi
value: undefined
license: BSD-2-Clause
repository-code: https://github.com/mastodon-sc/geff-java
repository-code: https://github.com/live-image-tracking-tools/geff-java
151 changes: 100 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,14 @@ The **Graph Exchange File Format (GEFF)** is a standardized format for storing a

## Features

- **Full GEFF specification compliance** - Supports Geff versions 0.0, 0.1, 0.2, and 0.3 (including patch versions, development versions, and metadata like 0.2.2.dev20+g611e7a2.d20250719)
- **Zarr-based storage** - Efficient chunked array storage for large-scale tracking data
- **GEFF v1 spec compliance** - Reads and writes GEFF v0.2 through v1.x and beyond; version validation uses a semver pattern rather than an allowlist
- **Zarr Format 2** - Reads and writes [Zarr Format 2](https://zarr-specs.readthedocs.io/en/latest/v2/v2.0.html) only; Zarr Format 3 is not supported
- **Complete data model** - Support for nodes (spatial-temporal features), edges (connections), and metadata
- **Flexible metadata handling** - Axis-based metadata with GeffAxis objects for spatial and temporal dimensions
- **Type safety** - Strong typing with comprehensive validation
- **Memory efficient** - Chunked reading and writing for handling large datasets
- **Flexible metadata handling** - Axis-based metadata with GeffAxis objects; supports `time`, `space`, and `channel` axis types with any axis name
- **Property metadata** - Full `node_props_metadata` / `edge_props_metadata` support as required by the v1 spec
- **Variable-length properties** - Read and write properties with `varlength: true` (e.g. polygon coordinates per node)
- **Type safety** - Strong typing with comprehensive validation; graceful skip with warning for unsupported types (`str`, `bytes`)
- **Adaptive chunking** - Chunk sizes targeting ~8 MiB per chunk (power-of-two on the first dimension)
- **Builder patterns** - Convenient object construction with builder classes for GeffNode and GeffEdge

## Core Classes
Expand All @@ -25,40 +27,51 @@ The **Graph Exchange File Format (GEFF)** is a standardized format for storing a
Represents nodes in tracking graphs with spatial and temporal attributes:
- Time point information (`t` property)
- Spatial coordinates (x, y, z)
- Segment identifiers
- Segment identifiers (dynamic property name via metadata)
- Additional properties: color, radius, covariance2d, covariance3d
- Polygon geometry: separate polygonX and polygonY coordinate arrays with polygon offset for serialization
- Polygon geometry stored via `polygonX`/`polygonY` builder fields, serialized to `serialized_props/polygon/`
- Variable-length properties accessible via `getVarlengthProperty(name)` / `setVarlengthProperty(name, ...)`
- Builder pattern for convenient object construction
- Chunked Zarr I/O support for versions 0.1, 0.2, and 0.3
- Chunked Zarr Format 2 I/O

### GeffEdge
### GeffEdge
Represents connections between nodes in tracking graphs:
- Source and target node references
- Edge properties: score, distance
- Builder pattern for convenient object construction
- Chunked storage for efficient large-scale edge data
- Support for different Geff version formats

### GeffAxis
Represents axis metadata for spatial and temporal dimensions:
- Predefined constants for common axis names (t, x, y, z)
- Type classifications (time, space)
- Type classifications: `time`, `space`, `channel`
- Unit specifications with common constants
- Optional min/max bounds for ROI definition

### GeffMetadata
Handles GEFF metadata with schema validation:
- Version validation via semver pattern (accepts any well-formed version)
- GeffAxis array supporting any axis name and the three axis types
- Node/edge property metadata maps (`nodePropsMetadata`, `edgePropsMetadata`)
- Dynamic tracklet property name from `track_node_props["tracklet"]`
- Graph properties (directed/undirected)

### PropMetadata
Describes a single node or edge property as required by the v1 spec:
- Fields: `identifier`, `dtype`, `varlength`, `unit`, `name`, `description`
- Used to infer data types on write and skip unsupported types on read

### VarlengthProperty
Stores a variable-length property (one array per node with potentially different shapes):
- `getNodeData(int nodeIndex)` – extract the data array for a specific node
- `isMissing(int nodeIndex)` – check the optional missing-value indicator
- Backed by a flattened data array and an offset/shape index

### GeffSerializableVertex
Lightweight geometry class internally used for storing polygon vertex coordinates:
- Simple (x, y) coordinate storage
- Part of the geometry package for efficient polygon handling

### GeffMetadata
Handles Geff metadata with schema validation:
- Version compatibility checking with pattern matching for development versions
- GeffAxis array for spatial/temporal metadata
- Graph properties (directed/undirected)
- Comprehensive validation with detailed error messages
- Support for multiple Geff version formats (0.1, 0.2, 0.3)

### Geff
Main utility class demonstrating library usage and providing examples.

Expand All @@ -69,8 +82,11 @@ Main utility class demonstrating library usage and providing examples.

## Dependencies

- **jzarr 0.3.5** - Zarr format support for Java
- **ucar.ma2** - Multi-dimensional array operations
- **n5** - N5/Zarr core data model
- **n5-zarr** - Zarr format reader/writer
- **n5-blosc** - Optional Blosc compression (falls back to raw compression if the native library is absent)
- **imglib2** - Multi-dimensional array and interval utilities
- **slf4j-api** - Logging facade

## Usage Example

Expand Down Expand Up @@ -118,7 +134,7 @@ GeffNode node1 = new GeffNode.Builder()
newNodes.add(node1);

// Write to Zarr format with version specification
GeffNode.writeToZarr(newNodes, "/path/to/output.zarr/tracks", "0.4.0");
GeffNode.writeToZarr(newNodes, "/path/to/output.zarr/tracks", "1.0.0");

// Create new edges using builder pattern
List<GeffEdge> newEdges = new ArrayList<>();
Expand All @@ -132,7 +148,14 @@ GeffEdge edge = new GeffEdge.Builder()
newEdges.add(edge);

// Write to Zarr format
GeffEdge.writeToZarr(newEdges, "/path/to/output.zarr/tracks", "0.4.0");
GeffEdge.writeToZarr(newEdges, "/path/to/output.zarr/tracks", "1.0.0");

// Access variable-length properties after reading (e.g. per-node polygon)
List<GeffNode> readNodes = GeffNode.readFromZarr("/path/to/data.zarr/tracks");
VarlengthProperty polygon = readNodes.get(0).getVarlengthProperty("polygon");
if (polygon != null) {
Object nodeData = polygon.getNodeData(0); // double[] or int[] for node 0
}

// Create metadata with axis information
GeffAxis[] axes = {
Expand All @@ -141,53 +164,76 @@ GeffAxis[] axes = {
new GeffAxis(GeffAxis.NAME_SPACE_Y, GeffAxis.TYPE_SPACE, GeffAxis.UNIT_MICROMETER, 0.0, 1024.0),
new GeffAxis(GeffAxis.NAME_SPACE_Z, GeffAxis.TYPE_SPACE, GeffAxis.UNIT_MICROMETER, 0.0, 100.0)
};
GeffMetadata metadata = new GeffMetadata("0.4.0", true, axes);
GeffMetadata metadata = new GeffMetadata("1.0.0", true, axes);
GeffMetadata.writeToZarr(metadata, "/path/to/output.zarr/tracks");
```

## Building

geff-java prefers `BloscCompression` for writing datasets when [c-blosc](https://github.com/Blosc/c-blosc) is available. If Blosc is not installed, it will print a warning and automatically fall back to `RawCompression`.

```bash
mvn clean compile
mvn test
mvn package
```

## Cross-Language Tests

Round-trip tests validate interoperability between geff-java and the Python reference implementation.

**Requirements:**
- [uv](https://docs.astral.sh/uv/) (Python package manager)

**Optional compression support:**
- If [c-blosc](https://github.com/Blosc/c-blosc) is installed, geff-java will use `BloscCompression` for output datasets.
- If Blosc is not available, geff-java prints a warning and automatically falls back to `RawCompression`, so end users can still run the library and the cross-language tests without extra native setup.
- On macOS, Blosc can be installed with `brew install c-blosc`.

**Run tests:**
```bash
mvn package -DskipTests
cd cross-language-tests
uv run run_tests.py
```

The tests create GEFF files with Python, read/write them with Java, and validate the results.

## Data Format

The library follows the Geff specification for biological tracking data:

```
dataset.zarr/
├── .zgroup # Zarr group metadata
├── .zattrs # Geff metadata (version, spatial info, etc.)
├── .zgroup
├── .zattrs # Geff metadata:
│ # version, directed, axes,
│ # node_props_metadata,
│ # edge_props_metadata,
│ # track_node_props
└── tracks/
├── .zgroup
├── nodes/
│ ├── .zgroup
│ ├── props/ # For Geff 0.2/0.3 format
│ │ ├── t/ # Time points [N]
│ │ ├── x/ # X coordinates [N]
│ │ ├── y/ # Y coordinates [N]
│ │ ├── z/ # Z coordinates [N] (optional)
│ │ ├── color/ # Node colors [N] (optional)
│ │ ├── radius/ # Node radii [N] (optional)
│ │ ├── track_id/ # Track identifiers [N] (optional)
│ │ ├── covariance2d/ # 2D covariance matrices for ellipse serialized in 1D [N, 4] (optional)
│ │ ├── covariance3d/ # 3D covariance matrices for ellipsoid serialized in 1D [N, 6] (optional)
│ │ └── polygon/ # Polygon coordinates (optional)
│ │ ├── slices/ # Polygon slices with startIndex and endIndex [N, 2] (optional)
│ │ └── values/ # XY coordinates of vertices in polygons [numVertices, 2] (optional)
│ └── ids/
│ └── 0 # Node ID chunks
│ ├── ids/ # Node IDs [N]
│ ├── props/
│ │ ├── t/values # Time points [N]
│ │ ├── x/values # X coordinates [N]
│ │ ├── y/values # Y coordinates [N]
│ │ ├── z/values # Z coordinates [N] (optional)
│ │ ├── color/values # RGBA colors [N, 4] (optional)
│ │ ├── radius/values # Node radii [N] (optional)
│ │ ├── <tracklet>/values # Track IDs [N] (name from track_node_props, optional)
│ │ ├── covariance2d/values # Flattened 2D covariance [N, 4] (optional)
│ │ ├── covariance3d/values # Flattened 3D covariance [N, 6] (optional)
│ │ └── <varlength_prop>/ # Variable-length property, e.g. polygon (optional)
│ │ ├── data # Flattened values [V]
│ │ ├── values # Offsets and shapes [N, ndim+1]
│ │ └── missing # Missing-value mask [N] (optional)
└── edges/
├── .zgroup
├── props/ # For Geff 0.2/0.3 format
│ ├── distance/ # Edge distances (optional)
│ └── score/ # Edge scores (optional)
└── ids/
├── 0.0 # Edge chunks (source nodes)
└── 1.0 # Edge chunks (target nodes)
├── ids/ # Source/target node ID pairs [N, 2]
└── props/
├── distance/values # Edge distances [N] (optional)
└── score/values # Edge scores [N] (optional)
```

## Technical Information
Expand All @@ -198,7 +244,10 @@ dataset.zarr/

### Contributors

* [Ko Sugawara](https://github.com/ksugar/) - Project maintainer
* [Ko Sugawara](https://github.com/ksugar/)
* [Jean-Yves Tinevez](https://github.com/tinevez)
* [Tobias Pietzsch](https://github.com/tpietzsch)
* [Caroline Malin-Mayor](https://github.com/cmalinmayor)

### License

Expand All @@ -217,4 +266,4 @@ dataset.zarr/
## Acknowledgements

* [Geff Python implementation](https://github.com/live-image-tracking-tools/geff) - Original specification and reference implementation
* [jzarr library](https://github.com/bcdev/jzarr) - Zarr format support for Java
* [N5-universe](https://github.com/saalfeldlab/n5) - N5/Zarr I/O libraries used by this implementation
83 changes: 83 additions & 0 deletions cross-language-tests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Cross-Language Round-Trip Tests

This directory contains tests for verifying interoperability between `geff-java` and the Python `geff` reference implementation.

## How It Works

1. Python creates mock GEFF files using `geff.testing.data` (available from geff package)
2. Java reads and re-writes them via `RoundTripGeff`
3. Python compares original vs round-tripped to verify equivalence

## Prerequisites

### Java
```bash
# Build the Java project (from repo root)
cd ..
mvn package -DskipTests
```

### Python
The script uses [uv](https://github.com/astral-sh/uv) for dependency management.

#### Setup Option 1: Using `uv sync` (recommended)
```bash
cd cross-language-tests
uv sync
```

#### Setup Option 2: Run directly (no environment setup needed)
```bash
cd cross-language-tests
uv run run_tests.py
```

Latest versions:
- `geff` (Python package) - see [live-image-tracking-tools/geff](https://github.com/live-image-tracking-tools/geff)
- `geff-spec` - GEFF metadata specification

## Running the Tests

After setting up with `uv sync`:
```bash
cd cross-language-tests
uv run run_tests.py
```

Or activate the virtual environment and run directly:
```bash
cd cross-language-tests
. .venv/bin/activate # or .venv\Scripts\activate on Windows
python run_tests.py
```

## Test Data Generation

The `geff.testing.data` module provides several fixture generators:

- `create_simple_2d_geff()` - 2D graphs with (t, x, y) + edge properties
- `create_simple_3d_geff()` - 3D graphs with (t, x, y, z) + edge properties
- `create_simple_temporal_geff()` - Temporal-only graphs (t dimension only)
- `create_empty_geff()` - Empty graphs (no nodes/edges, useful for edge cases)
- `create_mock_geff()` - Advanced: full control over node/edge properties, dtypes, dimensions

See the [geff.testing.data source](https://github.com/live-image-tracking-tools/geff/blob/main/packages/geff/src/geff/testing/data.py) for advanced usage with custom properties, variable-length arrays, and missing values.

## Test Cases

| Test | Description | Status |
| ------------------ | ---------------------------------------- | ------------------------- |
| `basic_3d` | Simple 3D graph (t, x, y, z) with edges | Check compatibility |
| `basic_2d` | Simple 2D graph (t, x, y) with edges | Check compatibility |
| `temporal_only` | Temporal graph (t only, no spatial dims) | Check temporal handling |
| `empty` | Empty graph (no nodes/edges) | Check edge case handling |
| `varlength_arrays` | Variable-length array properties | Check if supported/warned |
| `missing_values` | Properties with missing value arrays | Check if supported/warned |

## Output

Test data is written to `data/` directory (git-ignored).

Each test creates:
- `<test>_original.zarr` - Created by Python
- `<test>_roundtrip.zarr` - Read by Java, written by Java
12 changes: 12 additions & 0 deletions cross-language-tests/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[project]
name = "geff-java-cross-language-tests"
version = "0.1.0"
description = "Cross-language round-trip tests for geff-java"
requires-python = ">=3.10"
dependencies = [
"geff",
"zarr",
]

[tool.uv]
# This allows uv sync to work properly
Loading
Loading