The evolving contract between your code and y## The Intent / Implementation Distinction
A spec describes a problem. Code describes a solution. In theory those are separate layers. In practice, specs contain implementation constraints, code encodes intent in comments and type names, and issues are scoped to implementation details. Neither artifact tells the full story alone.
Re-spec-d evaluates drift against whatever the spec actually defines. Specs can describe any combination of these layers:
- Intent — the goal. "Users must authenticate before accessing protected resources."
- Outcome — the observable result. "Invalid credentials return 401 within 300ms."
- Implementation — the prescribed approach. "Use bcrypt with 12 rounds.".
Re-spec-d watches for drift across code, specs, docs, and LLM instruction files. When something falls out of sync, it opens a Redline — a structured, reviewable record. Nothing is amended until a Redline passes through a configurable Gate (human, agent, or escalation policy). The spec is the contract, and every decision is auditable.
Code, specs, docs, and LLM artifacts drift apart silently. Nobody knows when they fell out of sync or which one is right. Copilots and agents make this worse — they ship committed code in minutes against spec context they didn't know was stale.
Specs are supposed to describe a problem and code a solution, but in practice that line doesn't hold. Specs get written solution-first. Code carries intent in comments and type names. Issues are scoped to implementation details. Problem-language and solution-language end up scattered across every artifact with no contract between them.
Re-spec-d is that contract.
These are separate flows. Spec standardization runs when a spec changes. Drift detection runs when code changes. They are independent but the output of the first feeds the second.
Specs are standardized and scored before they enter the pipeline as drift anchors. A spec that hasn't been assessed can't be reliably used for classification.
Spec created or edited
│
▼
Deterministic structural check
│ does it have a problem statement, outcomes, acceptance criteria?
│ are implementation constraints mixed into the intent section?
│ score computed across dimensions: problem clarity, outcome coverage,
│ implementation separation, drift resistance
▼
LLM-1: Spec Standardization
│ spec is solution-first? LLM-1 infers the implied problem
│ mixed content? LLM-1 separates intent from implementation constraints
│ missing sections? LLM-1 drafts candidates — marked LLM-inferred, not overwritten
│ final score stored against the spec
▼
Spec stored as a rated, structured anchor
│ low-scoring specs flagged — Redlines against them carry less Gate weight
Code changes are classified against the rated specs produced by Flow 1.
git commit
│
▼
Diff Extraction (AST — deterministic)
│ structured IR of named changes — "loginUser: param_removed 'password'"
▼
Spec Matching (deterministic, fan-out)
│ concept + file path heuristics — which rated specs are relevant?
│ multiple specs can match — all checked independently
▼
RAG Context Retrieval (deterministic)
│ RAG-1: retrieve relevant spec chunks for each matched spec
│ RAG-2: retrieve relevant code context for the changed entities
│ symbol graph, git history, lexical index — assembled into context bundle
▼
LLM-2: Intent Classification (per matched spec)
│ receives: AST IR + RAG-1 spec context + RAG-2 code context
│ "What is the intent shift score for this change against this spec?"
│ → intent_score: 0.0 – 1.0 (not a boolean)
│ → drift_type: refactor | implementation_change | behavior_change |
│ spec_violation | spec_outdated
│ → intent_change_summary (plain language, required if score > 0)
▼
Decision Engine (deterministic)
│ intent_score < sensitivity threshold → log only, no Redline
│ intent_score >= threshold → Redline created, all scores stored
│ confidence < confidence floor → log only
▼
Redline Gate (configurable: human | agent | escalation policy)
│ all Redlines are stored regardless of score
│ Gate has configurable acceptance parameters:
│ - minimum intent_score to surface for review
│ - auto-approve below a score threshold (low-risk path)
│ - escalate to human above a score threshold (high-risk path)
│ PATCH /api/redlines/:id
├── accepted_as_draft → Change Request created → spec amended
├── rejected → dismissed, reason logged
└── merged_into_existing → folded into existing spec
The Redline stores the full classification: intent_score, drift_type, and intent_change_summary. That's what the reviewer reads. Low-intent Redlines are kept in the audit trail but filtered from the active review queue by default.
A Change Request is never created directly from drift detection. Redlines are the mandatory gate.
Specs define contracts. Contracts have layers:
- Intent — the goal. "Users must authenticate before accessing protected resources."
- Outcome — the observable result. "Invalid credentials return 401 within 300ms."
- Implementation — the prescribed approach. "Use bcrypt with 12 rounds."
A spec can describe any or all of these layers. Re-spec-d evaluates drift against whatever the spec actually defines — because any layer can be the specified contract.
What it doesn't do: flag every code change. Code changes are implementation signals, not intent signals. LLM-2 is explicitly asked to score the degree of intent shift — not just true/false:
drift_type |
intent_score |
What happened |
|---|---|---|
refactor |
~0.0 | Code changed, intent preserved, spec still accurate |
implementation_change |
~0.1–0.3 | Behavior changed, goal is the same; spec wording may need a minor update |
behavior_change |
~0.4–0.7 | Observable behavior changed and intent appears to have shifted |
spec_violation |
~0.7–1.0 | Code breaks acceptance criteria that are still valid |
spec_outdated |
~0.5–1.0 | Intent evolved; the spec describes an old goal |
The Gate's acceptance threshold is configurable per spec. Low-scoring Redlines are stored but filtered from the active review queue. High-scoring ones escalate.
The split here is not clean, and that's by design.
Deterministic AI tends to lead Human tends to lead
───────────── ─────────────── ───────────────────
AST extraction LLM-1: spec standardization Spec authoring
Spec matching LLM-2: intent classification Redline review / approval
RAG retrieval Redline body (intent_change_summary) Rejection rationale
Decision engine Sensitivity tuning
Deciding crType on ambiguous drift
Setting driver (spec vs code)
◄──────────────── Both participate ──────────────────►
Spec quality review
Reviewing classification confidence
Approving/rejecting derived outcomes
Writing acceptance criteria
In agentic workflows, an orchestrator agent may review Redlines that a coding agent produced. The gate still exists. The reviewer is configurable — but for critical specs (sensitivity: strict), a human reviewer is required regardless.
Developer commits to src/auth/login.ts
│ (post-commit hook or CI)
▼
Diff extracted: "loginUser: param_removed 'password', branch_removed 'if (!password)'"
│
▼
Spec matched: SPEC_AUTH_LOGIN (concept: authentication, file: src/auth/*)
│
▼
RAG context assembled: spec chunks for SPEC_AUTH_LOGIN (RAG-1)
+ code context for loginUser (RAG-2)
│
▼
LLM-2 classifies:
intent_score: 0.92
drift_type: spec_violation
intent_change_summary: "The function previously enforced password presence as a
security contract. That guard is gone. The spec says password validation is required."
│
▼
Decision: create_redline (crType: amend_code) — score exceeds Gate threshold
│
▼
Redline created (status: proposed)
│ → appears in dashboard, PR comment, Slack notification
▼
Reviewer reads intent_change_summary and intent_score, approves
│
▼
Change Request created: amend_code — "Restore password validation in loginUser"
Developer commits: extracts loginUser into separate auth module
│
▼
Diff extracted: "loginUser: moved, signature unchanged, logic preserved"
│
▼
RAG context assembled: spec chunks for SPEC_AUTH_LOGIN + code context for loginUser
│
▼
LLM-2 classifies:
intent_score: 0.05
drift_type: implementation_change
│
▼
Decision: suggest_spec_update (advisory only) — score below Gate threshold
│ "Spec references src/auth/login.ts — consider updating file path"
│ Redline stored with low score. Not surfaced for review.
▼
Developer can optionally update the spec's file path reference
Orchestrator creates GitHub Issue: "Add rate limiting to login endpoint"
│ → Issue allocated to Coding Agent
▼
Coding Agent implements the change, commits
│
▼
Re-spec-d checks drift against SPEC_AUTH_LOGIN
│
▼
RAG context assembled: spec chunks for SPEC_AUTH_LOGIN + code context for rate limit impl
│
▼
LLM-2 classifies:
intent_score: 0.74
drift_type: spec_outdated
intent_change_summary: "Spec says nothing about rate limiting. Code now enforces
it. The spec's contract has expanded."
│
▼
Redline created (status: proposed) — score exceeds Gate threshold
│
▼
Orchestrator Agent reviews Redline
│ (if SPEC_AUTH_LOGIN has sensitivity: strict → escalates to human)
▼
Approved → Change Request: amend_spec
│
▼
Spec updated: "Account requests are rate limited to 10/minute per IP"
│
▼
Other agents receive updated spec context on next retrieval
Developer edits specs/auth.md directly
│ removes an acceptance criterion
▼
Spec watcher detects the edit
│
▼
Spec Redline created (source: spec_edit)
│ "Acceptance criterion removed: 'Account locks after 5 failed attempts'"
▼
If low-risk (additive changes only) → auto_approved
If breaking (removal / narrowing) → requires review
Specs are Markdown files with structured frontmatter:
---
id: SPEC_AUTH_LOGIN
status: open
driver: spec
version: 3
sensitivity: strict
concepts:
- authentication
- login
- password-validation
acceptance:
- User with valid credentials receives a JWT within 500ms
- Invalid password returns 401 (not 403, not 500)
- Account locks after 5 consecutive failed attempts
- Locked account cannot log in even with correct password
---
# Auth: Login
Users authenticate via email/password. The system validates credentials
and enforces lockout policy.
## Implementation
- `POST /auth/login` accepts `{ email, password }`
- Uses bcrypt comparison (12 rounds)
- Returns JWT on success
- Returns 401 on invalid credentials
- Returns 423 when account is lockedA spec can describe intent, outcomes, implementation details, or any combination. Re-spec-d evaluates drift against whatever contract the spec defines.
Sensitivity controls how aggressively the system flags drift for a given spec. An auth spec should be paranoid. A UI layout spec can be lenient.
0.0 ──────────── 0.5 ──────────── 1.0
loose balanced strict
(ignore noise) (default) (flag everything)
| Preset | Level | When to use |
|---|---|---|
loose |
0.1 | Specs you're not actively enforcing (experiments, UI polish) |
balanced |
0.5 | Default. Catches clear violations, ignores noise |
strict |
0.85 | Critical specs (auth, payments, compliance) |
paranoid |
1.0 | Maximum sensitivity. Audits only. |
# In spec frontmatter:
sensitivity: strict # named preset
sensitivity: 0.75 # numeric
sensitivity: # full override
level: 0.9
confidenceThreshold: 0.4Sensitivity affects: spec matching thresholds, LLM confidence required to act, how strictly acceptance criteria are evaluated.
The system is designed so the LLMs only see a tiny, pre-targeted slice of the work. Every layer before the LLMs is deterministic and cheap.
Full codebase
├─ Whitespace / formatting / comments ← eliminated by AST parser
├─ Unchanged files ← eliminated by git diff
├─ Non-code changes ← filtered by riskHint
├─ Files matching no spec concept ← eliminated by spec matcher
├─ Specs with weak match scores ← eliminated by sensitivity floor
│
└─► Targeted IR ← this is what LLM-1 and LLM-2 receive
(typically: 1–5 entities × 1–3 specs)
A typical commit with 5 changed files and 2 matched specs = 2 LLM-1 calls (standardization) + 2 LLM-2 calls (classification), each at Tier 1 or Tier 2.
Each tier defines how much of the changed file LLM-2 sees. RAG context is assembled independently and included at every tier — it does not escalate, but its token budget scales with the tier.
At every tier, LLM-2 receives:
- The semantic IR from AST (always present)
- RAG-1: relevant spec chunks for the matched spec
- RAG-2: relevant code context for the changed entities (symbol graph, git history, lexical matches, vector search)
The tier controls how much of the raw changed file is added on top of that:
Tier 1: Semantic IR + RAG context
~1–2k tokens. Resolves ~70% of cases.
IR gives LLM-2 the structured "what changed".
RAG gives it the "what does this relate to" from both spec and code.
│ confident? → done ✅
│ no ↓
Tier 2: + raw git diff hunks
~3–5k tokens. Resolves ~20% more.
Diff fills in detail the IR abstracted away.
│ confident? → done ✅
│ no ↓
Tier 3: + full before/after entity source
~8–15k tokens. Resolves most of the rest.
Full source for when diff context isn't enough to judge intent.
│ confident? → done ✅
│ no ↓
Tier 4: needs_annotation
Developer asked: "what does this change do?"
Their answer is added to the context and re-runs from Tier 1.
RAG context is capped by a token budget per tier. It never triggers a new tier on its own.
Code is parsed into a Semantic Change Log — structured IR of named atomic changes:
{
"entity": "loginUser",
"file": "src/auth/login.ts",
"kind": "function",
"riskHint": "behavioral",
"changes": [
{ "type": "param_removed", "name": "password" },
{ "type": "branch_removed", "description": "if (!password)" },
{ "type": "jsdoc_changed", "description": "@param password removed from JSDoc" }
]
}| Language | Extractor | Resolution |
|---|---|---|
| TypeScript / JavaScript | ts-morph |
Type-resolved: inferred types, cross-file exports, JSDoc |
| Python | tree-sitter-python |
Functions, classes, decorators, docstrings |
| Go | tree-sitter-go |
Functions, structs, interfaces, exported symbols |
| Rust | tree-sitter-rust |
Functions, traits, impls, pub items |
| SQL | tree-sitter-sql |
Schema changes: columns, constraints, indexes |
| GraphQL | tree-sitter-graphql |
Type and field additions/removals |
| Protobuf | tree-sitter-proto |
Message and field changes |
| HCL / Terraform | tree-sitter-hcl |
Resource additions and removals |
| Others (17 more) | tree-sitter-* |
Syntax-level, falls back to raw diff |
Inline documentation (JSDoc, docstrings) is a spec signal. A @param removal is extracted as a jsdoc_changed entry and treated as a potential intent signal.
| Condition | Action |
|---|---|
confidence < threshold |
Log only |
intent_score < sensitivity floor |
Log only, Redline stored, not surfaced |
intent_score >= threshold + spec_violation + driver: spec |
create_redline (crType: amend_code) |
intent_score >= threshold + spec_outdated + driver: code |
create_redline (crType: amend_spec) |
intent_score >= threshold + behavior_change |
create_redline (crType from spec driver) |
driver: manual |
review |
Three strategies, tried in order:
- Explicit — source provides a spec ID directly. Auto-linked. Confidence 1.0.
- Deterministic — concept + file path heuristics match the change to a spec. Auto-linked if ≥0.8 confidence. Confirmation surfaced if multiple match or confidence < 0.8.
- LLM-Inferred — LLM receives the CR description + all open spec summaries and suggests matches. Always requires human confirmation. Never silent.
Two orthogonal axes of context:
Axis 1 — HOW MUCH OF THE CHANGED FILE you see (escalation tiers):
IR only ──→ + diff hunks ──→ + full source ──→ human annotation
Axis 2 — HOW MUCH OF THE BROADER REPO you understand (RAG):
nothing ──→ git history ──→ + symbol graph ──→ + vector search
RAG operates on two indexes — one over specs, one over code — so LLM-2 sees both sides of the contract when classifying intent:
| Index | Technology | What it answers |
|---|---|---|
| Symbol Graph | ts-morph traversal | Who calls this function? What types does it use? |
| Git History | git log -S / git log -L |
Why was this guard added? Was it a security fix? |
| Lexical (BM25) | MiniSearch (in-process) | Which test files reference this entity? |
| Vector / Spec (RAG-1) | pgvector + text-embedding-3-small |
Which spec content is relevant to this code change? |
| Vector / Code (RAG-2) | pgvector + text-embedding-3-small |
Which code context is relevant to the spec? |
| Trigger | How |
|---|---|
| Git commit | Post-commit hook → POST /api/events/commit |
| CI pipeline | POST /api/events/check (manual trigger) |
| Direct API | POST /api/change-requests |
| GitHub webhook | POST /api/webhooks/github |
| Jira webhook | POST /api/webhooks/jira |
| CLI | spec-drifter check |
| Method | Path | Description |
|---|---|---|
| POST | /api/specs |
Create a spec |
| GET | /api/specs |
List all specs |
| GET | /api/specs/:id |
Get live spec (with amendments) |
| GET | /api/specs/:id/history |
Spec changelog |
| PATCH | /api/specs/:id |
Update status / driver |
| Method | Path | Description |
|---|---|---|
| GET | /api/redlines |
List Redlines (filter by status, file, entity, minConfidence) |
| GET | /api/redlines/:id |
Get a single Redline |
| POST | /api/redlines/scan |
Scan unlinked entities for spec proposals |
| PATCH | /api/redlines/:id |
Review a Redline (approve / reject / merge) |
| DELETE | /api/redlines/:id |
Hard-delete |
Reviewing a Redline:
// Approve (creates Change Request)
{ "status": "accepted_as_draft", "reviewedBy": "kshaw", "reviewNote": "Confirmed" }
// Reject
{ "status": "rejected", "reviewedBy": "kshaw", "reviewNote": "Implementation change only" }
// Merge into existing spec
{ "status": "merged_into_existing", "specId": "SPEC_AUTH_LOGIN", "reviewedBy": "kshaw" }| Method | Path | Description |
|---|---|---|
| POST | /api/change-requests |
Create a CR |
| GET | /api/change-requests |
List CRs (?driftGroupId= supported) |
| GET | /api/change-requests/pending |
Pending CRs |
| GET | /api/change-requests/groups/:groupId |
All CRs from one trigger |
| POST | /api/change-requests/:id/resolve |
Resolve a CR (accept/reject/confirm link) |
| Method | Path | Description |
|---|---|---|
| POST | /api/events/commit |
Trigger from git hook |
| POST | /api/events/check |
Manual trigger |
| POST | /api/webhooks/github |
GitHub webhook |
| POST | /api/webhooks/jira |
Jira webhook |
| POST | /api/webhooks/linear |
Linear webhook |
┌──────────────┐ ┌──────────────────────┐ ┌──────────────────┐
│ specs │────→│ change_requests │────→│ drift_events │
├──────────────┤ ├──────────────────────┤ ├──────────────────┤
│ id │ │ id │ │ id │
│ content │ │ spec_id │ │ change_request_id│
│ status │ │ source │ │ entity_name │
│ driver │ │ type │ │ file │
│ version │ │ description │ │ drift_type │
│ concepts[] │ │ status │ │ intent_score │
│ acceptance[] │ │ link_method │ │ confidence │
│ sensitivity │ │ link_confidence │ │ reason │
│ score │ │ drift_group_id │ └──────────────────┘
└──────────────┘ └──────────────────────┘
┌─────────────────────┐
│ redline_notes │
├─────────────────────┤
│ id │
│ source │ ← drift_detected | unlinked_entity | on_demand | doc_extraction
│ file │
│ entity │
│ cr_type │ ← amend_code | amend_spec
│ proposed_content │
│ confidence │
│ intent_score │ ← 0.0–1.0 from LLM-2
│ intent_change_summary│ ← LLM-2 plain-language explanation
│ status │ ← proposed → accepted_as_draft | rejected | merged
│ reviewed_by │
│ commit_ref │
│ drift_group_id │
└─────────────────────┘
| Component | Choice |
|---|---|
| Language | TypeScript |
| Runtime | Node.js |
| API Framework | Fastify |
| Database | PostgreSQL |
| Queue | BullMQ (Redis) — in-memory fallback for dev |
| AST Parsing (TS) | ts-morph (type-resolved) |
| AST Parsing (other) | tree-sitter (pluggable registry) |
| LLM | Pluggable — OpenAI, Anthropic, Ollama |
| Validation | Zod |
| CLI | Commander.js |
Git Hook / API / Webhook / CLI
│
▼
┌───────────┐
│ Fastify │
│ API │
└─────┬─────┘
│
▼
┌───────────┐
│ BullMQ │
│ Queue │
└─────┬─────┘
│
▼
┌──────────────────────┐
│ Worker │
│ ┌────────────────┐ │
│ │ Diff Analyzer │ │ ← git diff → file list
│ ├────────────────┤ │
│ │ Semantic │ │ ← AST → structured IR (polyglot)
│ │ Extractor │ │
│ ├────────────────┤ │
│ │ Spec Matcher │ │ ← concept + file path fan-out
│ ├────────────────┤ │
│ │ Intent │ │ ← LLM: did GOAL change?
│ │ Classifier │ │
│ ├────────────────┤ │
│ │ Decision │ │ ← deterministic rules
│ │ Engine │ │
│ ├────────────────┤ │
│ │ Redline Agent │ │ ← mandatory human gate
│ └────────────────┘ │
└──────────┬───────────┘
│
▼
┌───────────┐
│ PostgreSQL│
│ specs │
│ redlines │
│ change_ │
│ requests │
└───────────┘
- LLM classifies intent, never decides actions — all decisions are deterministic rules
- LLM-2 must return structured JSON (Zod-validated), including mandatory
intent_score(0.0–1.0) anddrift_type - Deterministic layers (AST, entity resolution) always run before the LLM
- Confidence thresholds enforced per spec — low-confidence results are logged, not acted on
- No Change Request is ever created directly from drift detection — Redlines are the mandatory gate
- All decisions are auditable — full history in
drift_events - Webhook payloads are validated and normalized before processing
- Redline
reviewedByis typed: can be a human or an agent — both recorded
| Module | File | Status |
|---|---|---|
| Semantic extractor (facade) | src/pipeline/semantic-extractor.ts |
✅ |
| Extractor plugin interface | src/pipeline/extractors/base.ts |
✅ |
| Extractor registry | src/pipeline/extractors/registry.ts |
✅ 20 languages registered |
| TypeScript extractor | src/pipeline/extractors/typescript.ts |
✅ ts-morph + JSDoc diffing |
| Tiered intent classifier | src/pipeline/drift-classifier.ts |
✅ intent_score classification, tiered escalation |
| Context bundle | src/pipeline/context-bundle.ts |
✅ |
| Drift orchestrator | src/pipeline/drift-orchestrator.ts |
✅ Full pipeline, Redline gate |
| Spec matcher | src/pipeline/spec-matcher.ts |
✅ Concept + file path |
| Spec linker | src/pipeline/spec-linker.ts |
✅ Three-strategy linking |
| Decision engine | src/pipeline/decision-engine.ts |
✅ Gates on intent_score threshold, create_redline |
| Sensitivity | src/pipeline/sensitivity.ts |
✅ Curve + presets |
| Redline agent | src/pipeline/redline-agent.ts |
✅ Scoring, proposal, scan, promote |
| All API routes | src/api/routes/ |
✅ Fastify routes wired |
| RAG types + interfaces | src/rag/ |
✅ Full type definitions |
| RAG query planner | src/rag/query-planner.ts |
✅ Task-aware planner |
| RAG retriever | src/rag/retriever.ts |
✅ Parallel retrieval + reranking |
| Schema + types | src/types/schema.ts |
✅ Zod schemas |
| Module | Notes |
|---|---|
| Database layer | All routes have // TODO: query database stubs |
| PostgreSQL migrations | Schema designed, not written |
diff-analyzer.ts |
simple-git integration — throws NotImplemented |
| tree-sitter extractors | Stubs registered, return [], fall back to raw diff |
| Spec file watcher | Watch specs/ for direct edits |
| RAG index backends | Interfaces only — no MiniSearch/pgvector impl |
| CLI tool | Commander.js wiring not started |
- Deterministic first, AI second — AST and spec matching run before any LLM call
- Intent drift is not a code change — implementation evolution is not drift unless the goal changed
- Specs define contracts at any level — intent, outcomes, or prescribed implementation — all are enforceable
- Redline is the mandatory gate — no CR is ever created without Gate approval (human or authorized agent)
- Roles are not clean — AI and humans both participate in classification, review, and spec authoring; the line is about accountability checkpoints, not capability
- Mapping is inferred, not declared — no manual file-to-spec wiring
- Always auditable — every drift event, every decision, every CR is logged
- The spec is alive — it evolves through change requests, not ad-hoc edits
- Weak specs are flagged, not blocked — a low drift-resistance score is surfaced at the Gate; Redlines against weak specs carry less weight
npm install
npm run build
# Set env vars
export OPENAI_API_KEY=sk-...
export DATABASE_URL=postgres://localhost/spec-drifter
# Start server
npm start
# Create a spec
curl -X POST http://localhost:3000/api/specs \
-H "Content-Type: application/json" \
-d '{ "id": "SPEC_AUTH_LOGIN", "driver": "spec", "concepts": ["authentication"], "acceptance": ["User receives JWT on valid credentials", "Invalid password returns 401"] }'
# Check for drift manually
curl -X POST http://localhost:3000/api/events/check
# List proposed Redlines
curl http://localhost:3000/api/redlines?status=proposed- TLDR.md — Short summary: problem, solution, use cases
- PITCH.md — Sales pitch: humans today, agentic workflows tomorrow
- VALUE-ADDS.md — Near-term features and easy wins
- CONCERNS.md — Known risks, open problems, things worth solving