Skip to content

Kirvolque/reviewability

Repository files navigation

reviewability

A CI/CD quality gate that scores pull requests by how hard they are to review.

Catch diffs that are too large, too tangled, or too scattered to review safely — before they merge.

code review bottleneck

It doesn't matter how fast AI generates code — the bottleneck is the human reviewer.

Installation

pip install reviewability

Requires Python 3.12+.

The Idea

A pull request can be hard to review not because the code is poorly written, but because of how the changes are combined. Mixing renames, movements, and logic changes in one PR makes each harder to verify. This is especially common with AI-generated code. Unlike linters, Reviewability does not analyze the code — only how the changes are structured.

When a diff scores low, the typical remedies are splitting it into focused pull requests or deferring non-essential changes.

Reviewability computes metrics at the level of individual hunks, files, and the whole diff, feeding into Reviewability Scores (0.0 = hardest, 1.0 = easiest) with configurable thresholds for what counts as problematic.

Key Concepts

  • Hunk — a contiguous block of changes within a single file (the smallest unit of analysis)
  • Metric — a calculated value attached to a hunk, a file, or the whole diff
  • Score — a float [0.0, 1.0] representing reviewability at hunk, file, or diff level

Extensibility

The metric system is designed to be extended:

  • Add a metric — subclass HunkMetric, FileMetric, or OverallMetric, implement calculate(), register via registry.add()
  • Adjust scoring — provide a custom ReviewabilityScorer implementation
  • Adjust thresholds — edit reviewability.toml to change what score counts as problematic and what limits trigger violations

Usage

# Analyze a range of commits
reviewability HEAD~1 HEAD

# Analyze from stdin
git diff HEAD~1 | reviewability --from-stdin

# Use a custom config
reviewability --config path/to/reviewability.toml HEAD~1 HEAD

# Include per-file and per-hunk breakdowns
reviewability --detailed HEAD~1 HEAD

Output is JSON. Exit code is 0 if the gate passes, 1 if it fails.

Claude Code Skill

If you use Claude Code, a /reviewability skill is included. It runs the tool on the current diff, summarizes the results, and attempts to address any recommendations directly.

Movement Detection

Moved code is easy to review — the logic hasn't changed, only the location. The tool detects when a block of code is deleted from one place and inserted elsewhere (accounting for reindentation and package/import changes), and treats those hunks and files as relocations.

Relocations receive a perfect score and are excluded from the size and churn calculations that drive the overall score. A diff that is large only because of relocations is not penalized.

Metrics

Metrics are calculated at three levels: hunk, file, and overall diff.

Hunk-level

Metric Description
hunk.lines_changed Total lines added and removed in a hunk
hunk.added_lines Lines added in a hunk
hunk.removed_lines Lines removed in a hunk
hunk.context_lines Unchanged context lines surrounding the change
hunk.churn_ratio Ratio of added lines to total changed lines (0.0 = pure deletion, 1.0 = pure addition)
hunk.is_likely_moved Whether this hunk is a movement of code from another location

File-level

Metric Description
file.lines_changed Total lines added and removed across all hunks in a file
file.added_lines Total lines added in a file
file.removed_lines Total lines removed in a file
file.hunk_count Number of separate change regions in a file
file.max_hunk_lines Lines changed in the largest single hunk within a file
file.is_likely_moved Whether this file is a movement from another path

Overall-level

Metric Description
overall.lines_changed Total lines changed across the entire diff
overall.added_lines Total lines added across the entire diff
overall.removed_lines Total lines removed across the entire diff
overall.files_changed Number of files changed
overall.moved_lines Total lines in hunks identified as code movements
overall.change_entropy Shannon entropy of the distribution of changes across files
overall.largest_file_ratio Fraction of total diff lines in the most-changed file
overall.churn_complexity Average interleaving of adds and removes across hunks
overall.problematic_hunk_count Hunks with a score below the configured threshold
overall.problematic_file_count Files with a score below the configured threshold

Overall Scoring

The overall score is driven by two factors: diff size and churn complexity.

A larger diff is harder to review. A diff where adds and removes are interleaved within the same hunks is harder to review than one where they are separated. The score penalizes both — but only when they occur together. A large but directional diff (e.g. a bulk rename) scores well. A small but tangled diff also scores well. The worst score comes from a diff that is both large and internally mixed.

Research

Metrics are grounded in peer-reviewed research on code review effectiveness:

About

A tool for measuring the cognitive load of code reviews by analyzing diffs at hunk, file, and diff level. Computes configurable metrics to approximate review difficulty, designed for use in CI/CD pipelines.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages