SWHID Testing Harness

A technology-neutral testing harness for comparing different SWHID (SoftWare Hash Identifier) implementations on standardized test payloads.

Quick Start

Installation

pip install -e .[dev]

Optional: Install Implementation Dependencies

Some implementations require additional dependencies:

Ruby: gem install swhid (add ~/.gem/ruby/*/bin to PATH)
Rust: Install Rust toolchain if testing Rust implementation
Python: swh.model and swh.core packages are optional but recommended

Basic Usage

# Run all tests with all implementations
swhid-harness --category content --dashboard-output results.json

# Test specific implementations
swhid-harness --impl rust,python --category content

# Test specific categories
swhid-harness --category content,directory,git

Validate Results

# Validate results
python3 -m harness.models results.json

# Generate HTML table (single table, backward compatible)
python3 scripts/view_results.py results.json

# Generate separate tables per variant (recommended for v2)
python3 scripts/view_results.py results.json --output-dir output/

The table generator automatically detects SWHID variants (v1, v2, etc.) and can generate separate tables for each variant, making it easy to compare results across different SWHID versions, hash algorithms, and serialization formats.

Project Structure

swhid-rs-tools/
├── harness/             # Core harness (plugin system, test runner)
├── implementations/     # SWHID implementation plugins
├── payloads/            # Test payloads (content, directory, archive, git)
├── tests/               # Test suite (unit, integration, property, negative)
├── tools/               # Utility scripts (merge_results, json_diff, test scripts)
├── config.yaml          # Configuration (payloads, settings)
├── DEVELOPER_GUIDE.md   # Documentation
└── README.md            # This file

Adding Implementations

Implementations are auto-discovered from implementations/. See Developer Guide for details.

Multiple Git-Based Implementations

The harness includes three Git-based implementations (git-cmd, git (dulwich), and pygit2) that all compute Git hashes. While they produce identical results, each serves a purpose:

Cross-validation: Agreement across different libraries increases confidence in correctness
Availability: Different environments may have different tools available (git CLI, dulwich, or libgit2)
Bug detection: Different libraries may expose edge cases or implementation bugs
Performance comparison: Different backends have different performance characteristics

These implementations are wrappers around Git's hashing algorithm and should always agree. Disagreements indicate bugs in either the harness or the underlying libraries.

Documentation

Implementation Guide - Comprehensive overview of all SWHID implementations, their technology stack, features, and limitations
Developer Guide - Complete guide for running tests and adding implementations
Architecture - System architecture, component design, and data flow
Troubleshooting - Common issues, solutions, and debugging tips

Testing

# Run test suite
pytest

# With coverage
pytest --cov=harness

Configuration

Edit config.yaml to:

Add/modify test payloads
Adjust test settings (timeout, parallelism)
Configure output options

Negative Tests

Negative tests verify that implementations correctly reject invalid inputs. Add tests to the negative category with an expected_error field:

negative:
  - description: Test file that doesn't exist (triggers IO_ERROR)
    expected_error: IO_ERROR
    name: nonexistent_file
    path: payloads/negative/nonexistent_file.txt

A negative test passes when all supporting implementations correctly fail (reject the invalid input). Implementations that don't support the object type are automatically skipped and don't affect the result.

Commit Reference Resolution

The harness automatically resolves branch names, tag names, and short SHAs to full commit SHAs before passing them to implementations. This ensures all implementations receive consistent input regardless of their internal reference resolution capabilities.

Autodiscovery expectations

For Git archives that enable discover_branches and/or discover_tags, provide expected SWHIDs per discovered reference using the expected section:

  - name: comprehensive
    path: payloads/git-repository/comprehensive.tar.gz
    discover_branches: true
    discover_tags: true
    expected:
      branches:
        main: swh:1:rev:...
        feature-a: swh:1:rev:...
      tags:
        v1.0.0: swh:1:rel:...

During a run the harness looks up these values and compares the consensus SWHID from all successful implementations with the configured expectation. Missing entries are reported so new branches/tags can be documented explicitly.

Unsupported payloads

Each implementation declares the SWHID object types it supports. The harness checks those capabilities before running a test and marks incompatible payloads as SKIPPED instead of counting them as failures. The run summary still lists which implementations were skipped so that coverage gaps remain visible.

Output Format

Results are saved in canonical JSON format (v1.0.0) with:

Run metadata (id, timestamp, branch, commit)
Implementation details (version, capabilities)
Test results (status, SWHID, metrics, errors)
Aggregated statistics

License

GPL-3.0 - See LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 249 Commits
.github		.github
docs		docs
harness		harness
implementations		implementations
payloads		payloads
scripts		scripts
tests		tests
tools		tools
.coverage		.coverage
.coveragerc		.coveragerc
.gitattributes		.gitattributes
.gitignore		.gitignore
BISECTION_PROCESS.md		BISECTION_PROCESS.md
DEVELOPER_GUIDE.md		DEVELOPER_GUIDE.md
IMPLEMENTATIONS.md		IMPLEMENTATIONS.md
LICENSE		LICENSE
PLATFORM_LIMITATIONS.md		PLATFORM_LIMITATIONS.md
README.md		README.md
WORKFLOWS.md		WORKFLOWS.md
bisect_harness.sh		bisect_harness.sh
config.yaml		config.yaml
custom.html		custom.html
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SWHID Testing Harness

Quick Start

Installation

Optional: Install Implementation Dependencies

Basic Usage

Validate Results

Project Structure

Adding Implementations

Multiple Git-Based Implementations

Documentation

Testing

Configuration

Negative Tests

Commit Reference Resolution

Autodiscovery expectations

Unsupported payloads

Output Format

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SWHID Testing Harness

Quick Start

Installation

Optional: Install Implementation Dependencies

Basic Usage

Validate Results

Project Structure

Adding Implementations

Multiple Git-Based Implementations

Documentation

Testing

Configuration

Negative Tests

Commit Reference Resolution

Autodiscovery expectations

Unsupported payloads

Output Format

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages