Pact

Contracts before code. Tests as law. Agents that can't cheat.

Pact is a multi-agent software engineering framework where the architecture is decided before a single line of implementation is written. Tasks are decomposed into components, each component gets a typed interface contract, and each contract gets executable tests. Only then do agents implement -- independently, in parallel, even competitively -- with no way to ship code that doesn't honor its contract. Generates Python, TypeScript, or JavaScript.

The insight: LLMs are unreliable reviewers but tests are perfectly reliable judges. So make the tests first, make them mechanical, and let agents iterate until they pass. No advisory coordination. No "looks good to me." Pass or fail.

When to Use Pact

Pact is for projects where getting the boundaries right matters more than getting the code written fast. If a single Claude or Codex session can build your feature in one pass, just do that -- Pact's decomposition, contracts, and multi-agent coordination would be pure overhead.

Use Pact when:

The task has multiple interacting components with non-obvious boundaries
You need provable correctness at interfaces -- not "it seems to work" but "it passes 200 contract tests"
The system will be maintained by agents who need contracts to understand what each piece does
You want competitive or parallel implementation where multiple agents race on the same component
The codebase is large enough that no single context window can hold it all

Don't use Pact when:

A single agent can build the whole thing in one shot
The task is a bug fix, refactor, or small feature
You'd spend more time on contracts than on the code itself

Benchmark: ICPC World Finals

Tested on 5 ICPC World Finals competitive programming problems (212 test cases total) using Claude Opus 4.6.

Condition	Pass Rate	Cost
Claude Code single-shot	167/212 (79%)	$0.60
Claude Code iterative (5 attempts)	196/212 (92%)	$1.26
Pact (solo, noshape)	212/212 (100%)	~$13

Pact's contract-first pipeline solves problems that iterative prompting cannot. On Trailing Digits (2020 World Finals), Claude Code scores 31/47 even with 5 retry iterations and full test feedback -- the naive algorithm times out on large inputs. Pact's interview and decomposition phases force upfront mathematical analysis, producing the correct O(log n) approach on the first implementation attempt.

Full results: icpc_official/RESULTS.md in the benchmark directory.

Philosophy: Contracts Are the Product

Pact treats contracts as source of truth and implementations as disposable artifacts. The code is cattle, not pets.

When a module fails in production, the response isn't "debug the implementation." It's: add a test that reproduces the failure to the contract, flush the implementation, and let an agent rebuild it. The contract got stricter. The next implementation can't have that bug. Over time, contracts accumulate the scar tissue of every production incident -- they become the real engineering artifact.

This inverts the traditional relationship between code and tests. Code is cheap (agents generate it in minutes). Contracts are expensive (they encode hard-won understanding of what the system actually needs to do). Pact makes that inversion explicit: you spend your time on contracts, agents spend their time on code.

Quick Start

git clone https://github.com/jmcentire/pact.git
cd pact
make
source .venv/bin/activate

That's it. Now try:

pact init my-project
# Edit my-project/task.md with your task
# Edit my-project/sops.md with your standards
pact --help

How It Works

Task
  |
  v
Interview --> Shape (opt) --> Decompose --> Contract --> Test
                                                          |
                                                          v
                                    Implement (parallel, competitive)
                                                          |
                                                          v
                                    Integrate (glue + parent tests)
                                                          |
                                                          v
                                    Arbiter Gate (access graph + trust)
                                                          |
                                                          v
                                    Polish (Goodhart tests + regression)
                                                          |
                                                          v
                                    Certify (tamper-evident proof)

Nine phases (plus diagnose as a recovery state):

Interview -- Establish processing register, then identify risks, ambiguities, ask clarifying questions
Shape -- (Optional) Produce a Shape Up pitch: appetite, breadboard, rabbit holes, no-gos
Decompose -- Task into 2-7 component tree, guided by shaping context if present. Contract generation, test authoring (including emission compliance tests), and Goodhart adversarial tests all happen here.
Implement -- Each component built independently by a code agent with structured event emission
Integrate -- Parent components composed via glue code
Arbiter -- Generate access_graph.json, register with Arbiter for blast radius analysis. HUMAN_GATE pauses pipeline.
Polish -- Cross-component regression check + Goodhart test evaluation with graduated-disclosure remediation
Complete -- Certification with tamper-evident proof (SHA-256 hashes)

Diagnose is not a numbered phase — it's a recovery state. On failure at any phase, the system enters diagnose for I/O tracing, root cause analysis, and recovery routing back to implement.

Stack Integration

Pact is the contract-first build system in a larger stack:

Tool	Role	Pact's Relationship
Constrain	Upstream policy	`--constrain-dir` seeds decomposition with constraints, component maps, trust policies
Arbiter	Trust gate	Phase 8.5 POSTs `access_graph.json` for blast radius analysis. HUMAN_GATE pauses pipeline
Ledger	Field-level audit	`--ledger-dir` loads assertions into contract test suites as hard requirements
Sentinel	Production monitoring	Separate package. Pact embeds PACT keys for attribution. `pact sentinel push-contract` accepts tightened contracts

All integrations are optional. Without them, Pact operates as a standalone build system.

Contract Schema

Every contract includes:

data_access:
  reads: [PUBLIC, PII]
  writes: [PUBLIC]
  rationale: "Reads user.email for personalization, writes public analytics events"
  side_effects:
    - type: database_read
      classification: PII
      fields: ["user.email", "user.created_at"]
      rationale: "Fetch user profile for display"

authority:
  domains: ["user_profile"]
  rationale: "Authoritative source for user profile data within this service"

Anti-cliche enforcement rejects vague rationale strings ("handles data", "manages stuff"). Rationale must describe the specific data accessed and why.

Canonical Types with Validators

Contracts encourage defining canonical data structures with validators rather than passing raw primitives. A field like email: str becomes a validated type with a regex constraint; amount: float carries range and precision rules. The ValidatorSpec schema supports range, regex, length, and custom rules:

types:
  - name: PaymentAmount
    kind: struct
    fields:
      - name: value
        type_ref: float
        validators:
          - kind: range
            expression: "0.01 <= value <= 999999.99"
            error_message: "Amount must be between $0.01 and $999,999.99"
      - name: currency
        type_ref: str
        validators:
          - kind: regex
            expression: "^[A-Z]{3}$"
            error_message: "Currency must be a 3-letter ISO 4217 code"

Tests automatically verify both acceptance and rejection: valid instances pass, invalid inputs fail with appropriate errors, and boundary values behave correctly. Implementations render these as Pydantic models (Python), Zod schemas (TypeScript), or class constructors with validation (JavaScript).

Audit Repo Separation

Pact supports a two-repo separation-of-privilege model where the coding agent and auditing agent operate in different repositories:

pact audit-init ./my-project --audit-dir ./my-project-audit
pact sync ./my-project          # Sync visible tests (never Goodhart) to code repo
pact certify ./my-project       # Tamper-evident certification proof

The coding agent cannot modify the tests that judge its work. The certification artifact includes SHA-256 hashes of all contracts, tests, and implementations with a self-integrity hash.

Structured Event Emission

All implementations accept optional event_handler and log_handler. Every public method emits structured events:

self._emit({
    "pact_key": "PACT:auth_module:validate_token",
    "event": "completed",
    "output_classification": ["PII"],
    "side_effects": ["database_read"],
    "ts": time.time_ns()
})

PACT keys are string literals (not computed) so Sentinel can discover them via static analysis. Emission compliance tests are auto-generated from the contract interface. See PACT_KEY_STANDARD.md for the canonical format specification.

Health Monitoring

Pact monitors its own coordination health -- detecting the specific failure modes of agentic pipelines before they consume the budget.

Metric	What It Detects	Active Phases
Output/planning ratio	Spending $50 on planning and shipping nothing	decompose+
Rejection rate	Agents optimizing for each other's approval, not outcomes	all
Budget velocity	Coordination cost exceeding execution value	decompose+
Phase balance	Any single phase consuming disproportionate budget	all
Cascade detection	One component's failure propagating through the tree	all
Register drift	Agent departing from established processing mode mid-task	implement+

Health checks are phase-aware: artifact-production metrics (output/planning ratio, budget velocity) only activate after the decompose phase begins. Interview and shape phases are planning-only by design -- applying artifact-production checks there would trigger false positives and block the pipeline.

pact health my-project

Two Execution Levers

Lever	Config Key	Effect
Parallel Components	`parallel_components: true`	Independent components implement concurrently
Competitive Implementations	`competitive_implementations: true`	N agents implement the SAME component; best wins

Either, neither, or both. Defaults: both off (sequential, single-attempt).

CLI Commands

Command	Purpose
`pact init <project>`	Scaffold a new project
`pact run <project>`	Run the pipeline
`pact daemon <project>`	Event-driven mode (recommended)
`pact status <project>`	Show project or component status
`pact components <project>`	List components with status
`pact build <project> <id>`	Build/rebuild a specific component
`pact validate <project>`	Re-run contract validation
`pact audit <project>`	Spec-compliance audit
`pact certify <project>`	Run certification (all tests, tamper-evident proof)
`pact audit-init <project>`	Initialize audit repo separation
`pact sync <project>`	Sync visible tests from audit repo
`pact sentinel status`	Show Sentinel/Arbiter connection config
`pact sentinel push-contract <id> <file>`	Accept tightened contract from Sentinel
`pact sentinel list-keys`	List all PACT keys in project
`pact health <project>`	Show health metrics and proposed remedies
`pact tasks <project>`	List phase tasks with status
`pact handoff <project> <id>`	Render/validate handoff brief
`pact adopt <project>`	Adopt existing codebase under pact governance
`pact mcp-server`	Run MCP server (stdio transport)

Run flags: --constrain-dir, --ledger-dir, --skip-arbiter.

Configuration

Per-project (pact.yaml in project directory):

budget: 25.00
parallel_components: true
competitive_implementations: true

# Stack integration (all optional)
constrain_dir: ./constrain-output/
ledger_dir: ./ledger-export/
arbiter_endpoint: http://localhost:8080
skip_arbiter: false

# Audit repo separation
audit_dir: ../my-project-audit
audit_mode: code    # "audit" | "code" | ""

# Shaping
shaping: true
shaping_depth: standard

# Health thresholds
health_thresholds:
  output_planning_ratio_warning: 0.3
  rejection_rate_critical: 0.9

Multi-Provider Configuration

Route different roles to different providers for cost optimization:

role_models:
  decomposer: claude-opus-4-6
  contract_author: claude-opus-4-6
  test_author: claude-sonnet-4-5-20250929
  code_author: gpt-4o

role_backends:
  decomposer: anthropic
  code_author: openai

Available backends: anthropic, openai, gemini, claude_code, claude_code_team.

Project Structure

my-project/
  task.md              # What to build
  sops.md              # How to build it
  pact.yaml            # Budget and config
  access_graph.json    # Data access graph (consumed by Arbiter)
  decomposition/       # Decomposition tree, decisions, interview
  contracts/<cid>/     # Interface specs with data_access + authority
  src/<cid>/           # Implementation source + glue code
  tests/<cid>/         # Contract tests + Goodhart tests
  certification/       # Tamper-evident certification proof
  .pact/               # Ephemeral run state (gitignored)

MCP Server

pip install pact-agents[mcp]
pact-mcp

7 tools for Claude Code integration: status, contracts, budget, validate, resume.

Codebase Analysis (Tool Index)

When adopting or analyzing a codebase, Pact can leverage external tools for richer understanding. All tools are optional -- if not installed, they're silently skipped.

Tool	What It Provides	Install
universal-ctags	Symbol index (functions, classes, members, scope, signatures)	`brew install universal-ctags`
cscope	Cross-references and call graph (best for C/C++)	`brew install cscope`
tree-sitter	Full CST, error-tolerant parsing, cross-language (preferred for Python/TS/JS)	`pip install pact-agents[analysis]`
kindex	Existing project knowledge from the knowledge graph	kindex

Tree-sitter is preferred over cscope for Python, TypeScript, and JavaScript codebases. Both pact adopt and green-field workflows benefit -- agents receive enriched context about symbol hierarchies, class structure, and existing project knowledge.

# pact.yaml
tool_index_enabled: true  # true | false | null (auto-detect)

Development

make dev          # Install with LLM backend support
make test         # Run full test suite (1811 tests)
make test-quick   # Stop on first failure

Requires Python 3.12+. Core dependencies: pydantic and pyyaml.

Architecture

See CLAUDE.md for the full technical reference.

Background

Pact is one of three systems (alongside Emergence and Apprentice) built to test the ideas in Beyond Code: Context, Constraints, and the New Craft of Software.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.claude/commands		.claude/commands
.github/workflows		.github/workflows
docs		docs
src/pact		src/pact
tests		tests
.gitignore		.gitignore
.kin		.kin
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
PACT_KEY_STANDARD.md		PACT_KEY_STANDARD.md
README.md		README.md
component_map.yaml		component_map.yaml
config.yaml		config.yaml
constraints.yaml		constraints.yaml
prompt.md		prompt.md
pyproject.toml		pyproject.toml
schema_hints.yaml		schema_hints.yaml
task_r2.md		task_r2.md
task_r3.md		task_r3.md
trust_policy.yaml		trust_policy.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pact

When to Use Pact

Benchmark: ICPC World Finals

Philosophy: Contracts Are the Product

Quick Start

How It Works

Stack Integration

Contract Schema

Canonical Types with Validators

Audit Repo Separation

Structured Event Emission

Health Monitoring

Two Execution Levers

CLI Commands

Configuration

Multi-Provider Configuration

Project Structure

MCP Server

Codebase Analysis (Tool Index)

Development

Architecture

Background

Related

License

About

Uh oh!

Releases 15

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pact

When to Use Pact

Benchmark: ICPC World Finals

Philosophy: Contracts Are the Product

Quick Start

How It Works

Stack Integration

Contract Schema

Canonical Types with Validators

Audit Repo Separation

Structured Event Emission

Health Monitoring

Two Execution Levers

CLI Commands

Configuration

Multi-Provider Configuration

Project Structure

MCP Server

Codebase Analysis (Tool Index)

Development

Architecture

Background

Related

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages