Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions providers/CONTRACT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Provider normalization contract

The `llm-dark-patterns` detectors read two things from a model turn: the assistant's
**closeout text** and the **tool calls it actually emitted**. To run them against models
other than Claude (the provider-invariance question — does operator-side discipline
generalize across models?), each provider's raw transcript is normalized to one shape.

## `NormalizedTurn`

```
NormalizedTurn:
assistant_message: str # the model's final/closeout text only
tool_calls: list[ {name: str, arguments: str|dict} ] # calls the model actually emitted
```

- `assistant_message` is the answer text only — never reasoning/CoT, never tool output.
- `tool_calls` is what crossed the tool boundary. A turn that *claims* a dispatch in text
but emits no call has `tool_calls == []` — that gap is the dispatch-fabrication signal.

## Adapter interface

An adapter is a pure function `raw_payload (dict) -> NormalizedTurn`. No network, no deps.
Register it in `normalize.ADAPTERS[name]`. Current adapters: `claude_hook`,
`openai_chat`, `openai_responses`, `kimi`.

## Mapping table (formats verified live 2026-05-26)

| Field | Claude Code hook | OpenAI Chat | OpenAI Responses | Kimi / Moonshot |
|---|---|---|---|---|
| assistant text | `.last_assistant_message` (¹) | `choices[0].message.content` | `output[].type=="message"` → `content[].output_text.text` | `choices[0].message.content` (NOT `reasoning_content`) |
| tool calls | PreToolUse `tool_name`/`tool_input`, or transcript `tool_use` blocks (²) | `choices[0].message.tool_calls[].function.{name,arguments}` | `output[].type=="function_call"` → flat `{name, arguments}` | OpenAI-compatible `tool_calls`; skip `type=="builtin_function"` server echoes |

¹ **Flagged (version-gated):** `last_assistant_message` appears in 2026 community Stop-hook
implementations but was not rendered on the official hooks page fetched 2026-05-26. The
adapter falls back to the last assistant turn in an inline `messages`/`transcript`.

² **Flagged (insufficient_data):** the exact `tool_input` key names for the `Task`/Agent
dispatch tool were not quoted in the official docs; `arguments` is treated as an opaque
object (detectors use the tool *name* and *count*, not its argument schema).

## Conformance bar (what a NEW adapter must pass)

`providers/conformance_test.py` must stay green after adding an adapter. Concretely, for
the same logical turn expressed in your provider's envelope:

1. `assistant_message` equals the logical text (answer only — exclude reasoning/CoT).
2. `tool_names()` equals the set of calls actually emitted (skip server-side/builtin echoes).
3. `lib/count_drift.py` (or any text detector) returns the SAME verdict as on the Claude
envelope — this is the provider-invariance property the substrate exists to prove.
4. A claimed-but-uncalled dispatch yields `tool_calls == []`.

Fixtures in `providers/fixtures.py` are documented-format synthetic; a real adapter
should additionally be validated against live-captured responses (that live validation is
the adapter author's responsibility — e.g. @yurukusa's Kimi K2 / OpenAI adapter).

## Status / scope

This is the offline normalization layer for cross-model EVAL. It does not re-plumb the
hooks' hot path (they keep their Claude-hook entry) and it makes no API calls. A labeled
cross-model corpus and actual per-model F1 numbers are a separate follow-up; this contract
is the prerequisite that makes that pass runnable.
37 changes: 37 additions & 0 deletions providers/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# providers/ — run the dark-pattern detectors cross-model

This directory is the Claude-side substrate for **provider-invariance**: normalizing any
provider's model turn into one shape so the `llm-dark-patterns` detectors run unchanged
against OpenAI / Kimi K2 / etc., not just Claude. It is the prerequisite for a cross-model
F1 pass (the question from anthropics/claude-code#61167: *does operator-side discipline
generalize across models, or lean on Claude-specific guard behavior?*).

## Files
- `normalize.py` — `NormalizedTurn` + adapters (`claude_hook`, `openai_chat`,
`openai_responses`, `kimi`). Pure stdlib, no network.
- `fixtures.py` — the same logical turns in each provider's documented raw envelope.
- `conformance_test.py` — proves cross-envelope equivalence + identical detector verdicts.
- `CONTRACT.md` — the schema, mapping table, and the conformance bar for a new adapter.

## Run
```
python3 providers/conformance_test.py # exit 0 = all envelopes agree
```

## Adding an adapter (e.g. a live Kimi K2 / OpenAI adapter)
1. Write `raw_payload -> NormalizedTurn` in `normalize.py`; register in `ADAPTERS`.
2. Add your provider's raw envelope to each logical turn in `fixtures.py`.
3. Keep `conformance_test.py` green: same `assistant_message`, same `tool_names()`, same
`count_drift` verdict as the Claude envelope; claimed-but-uncalled dispatch → `[]`.
4. Validate against live-captured responses on your side (the synthetic fixtures here are
format-faithful, not live-captured).

The intended division of labor (per #61167): this contract + the Claude reference adapter
are maintained here; @yurukusa's Kimi K2 / OpenAI tool-format adapter conforms to it, and
the falsifiable provider-invariance F1 pass becomes runnable once a labeled cross-model
corpus exists.

## What this is not
Not a hot-path rewrite (the hooks keep their Claude-hook entry), not a network client, and
not the F1 result itself — it is the offline normalization layer that makes the cross-model
comparison possible.
151 changes: 151 additions & 0 deletions providers/SPEC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# SPEC — providers/: provider-neutral normalization for cross-model hook runs

Status: ACTIVE (pre-implementation). Author: waitdeadai. Date: 2026-05-26.
Origin: the provider-invariance thread with @yurukusa (anthropics/claude-code#61167)
and @nvst18's request to trial non-Claude models under the same runtime verification.
This is the Claude-side substrate contract that lets the `llm-dark-patterns` detectors
run against transcripts from other providers — the prerequisite for any cross-model F1
pass. @yurukusa builds the live Kimi K2 / OpenAI adapter against this contract.

## 1. Problem Statement

The suite's detectors consume Claude Code's hook JSON (`.last_assistant_message`, tool
events). To test provider-invariance ("does operator-side discipline generalize across
models?") the detectors must run against OpenAI / Kimi K2 transcripts too. There is no
provider-neutral shape today, so the substrate cannot run a second-model pass.

## 2. Success Criteria (measurable)

- **SC1 (normalization):** `providers/normalize.py` maps raw fixtures from Claude Code
hook JSON, OpenAI Chat Completions, OpenAI Responses, and Kimi/Moonshot into a single
`NormalizedTurn` (`assistant_message: str`, `tool_calls: [{name, arguments}]`).
Verified by `providers/conformance_test.py` asserting field-level equivalence.
- **SC2 (provider-invariance proof — the load-bearing one):** the SAME logical closeout
expressed in every supported envelope normalizes to the same `assistant_message`, and
running an existing detector (`lib/count_drift.py`) over each yields an IDENTICAL
verdict. A count-drift positive → `block` on all envelopes; a negative → `pass` on all.
This demonstrates the detector is invariant to the provider envelope (the substrate
property the whole thread needs). Verified by the conformance test.
- **SC3 (tool-call extraction):** for a turn that emits tool calls, every adapter yields
the same `tool_calls` name list; for a turn that CLAIMS a dispatch in text but emits no
call, `tool_calls == []` on every envelope (the dispatch-fabrication signal — what
@yurukusa's dispatch-receipt needs cross-model). Verified by fixtures.
- **SC4 (no deps, no network):** pure Python stdlib; no API calls; runs offline.
Verified by inspection + the test running with no network.
- **SC5 (contract documented):** `providers/CONTRACT.md` specifies the `NormalizedTurn`
schema, the adapter interface, the per-provider mapping table, and the conformance bar
a NEW adapter must pass — so @yurukusa's adapter has a spec. The two research-uncertain
items (Claude `last_assistant_message` is version-gated with a transcript-parse
fallback; the `Task` tool_input key names are opaque) are flagged explicitly.
- **SC6 (no regression):** additive under `providers/`; the bundled-plugin smoke and
stress CI still pass.

Non-criteria: live API calls to OpenAI/Kimi (out — needs creds; @yurukusa owns the live
adapter); a labeled cross-model corpus / actual F1 numbers (out — separate follow-up).

## 3. Scope

**In scope:**
- `providers/normalize.py` — `NormalizedTurn` dataclass + `from_claude_hook`,
`from_openai_chat`, `from_openai_responses`, `from_kimi` (Kimi reuses the OpenAI path
with documented quirks). Pure stdlib.
- `providers/fixtures/` — the same logical turns (a count-drift positive, a clean
negative, a dispatch-claim-without-call) in each provider's raw envelope.
- `providers/conformance_test.py` — asserts SC1/SC2/SC3 (cross-envelope equivalence +
identical detector verdicts + tool-call extraction).
- `providers/CONTRACT.md` + `providers/README.md` — schema, mapping table, adapter
conformance bar, how @yurukusa's Kimi/OpenAI adapter plugs in, flagged uncertainties.

**Out of scope (this PR):**
- Live API calls / network (creds; @yurukusa's live adapter).
- A cross-model labeled corpus and actual provider-invariance F1 numbers (follow-up once
this substrate + a corpus exist).
- Re-plumbing every hook to read `NormalizedTurn` (the hooks keep their Claude-hook entry
path; this is the offline normalization layer for cross-model EVAL, not a hot-path
rewrite).

## 4. Design

`NormalizedTurn`:
- `assistant_message: str` — the model's final/closeout text (what text detectors read).
- `tool_calls: list[dict]` — `[{ "name": str, "arguments": str|dict }]`, the calls the
assistant actually emitted (count + names; for dispatch-fabrication / count work).

Mapping (from research, accessed 2026-05-26; live-verified unless flagged):
- **Claude Code hook**: `assistant_message` ← `.last_assistant_message` (version-gated;
fallback: parse `.transcript_path` JSONL for the last assistant turn — FLAGGED
uncertain S1/S3); `tool_calls` ← PreToolUse `tool_name`/`tool_input` or transcript
tool-use blocks (`Task` tool_input keys opaque — FLAGGED insufficient_data).
- **OpenAI Chat Completions**: `assistant_message` ← `choices[0].message.content`;
`tool_calls` ← `choices[0].message.tool_calls[].function.{name, arguments}`.
- **OpenAI Responses**: `assistant_message` ← output text items; `tool_calls` ←
`function_call` items `{name, arguments}` (note `call_id`, flat name/arguments).
- **Kimi/Moonshot**: OpenAI-compatible `tool_calls`; quirks: semantic ids (`"search:0"`),
`type:"builtin_function"` server-executed echoes, `reasoning_content` separate from
`content` (must not be folded into `assistant_message`).

Divergences that matter: (a) function vs tool nomenclature; (b) Responses flat vs Chat
nested arguments; (c) Kimi `reasoning_content` must be excluded from `assistant_message`;
(d) "claimed dispatch, no call" looks identical across all → `tool_calls == []` (the
fabrication signal).

## 5. Agent-Native Estimate

- Estimate type: agent-native wall-clock.
- Topology: local single loop (one tightly-coupled normalizer + test); not parallelizable.
- Capacity evidence: local-bound; lanes don't reduce the critical path.
- Critical path: SPEC → /specqa → /introspect → normalize.py → fixtures → conformance
test → /verify.
- Agent wall-clock: optimistic ~5 build/verify cycles, likely ~8, pessimistic ~12 (if
envelope quirks need a tuning pass).
- Agent-hours: low. Human touch time: review + merge. Calendar blockers: none (additive,
feature branch, no `.github/workflows` touched → no scope wall).
- Confidence: medium — downgrade reason: two research-flagged uncertainties (Claude
`last_assistant_message` version-gating, `Task` tool_input opacity); the design routes
around both (fallbacks + opaque treatment), so they don't block.

## 6. Implementation Plan

### Task 1: `providers/normalize.py`
DoD: `NormalizedTurn` dataclass; the four adapters; Kimi excludes `reasoning_content`;
graceful on missing fields (returns empty message / empty tool_calls, never raises).

### Task 2: `providers/fixtures/`
DoD: three logical turns × four envelopes: (a) count-drift positive ("Six findings:" + 5
items), (b) clean negative, (c) dispatch-claim-without-call. Raw JSON per envelope.

### Task 3: `providers/conformance_test.py`
DoD: asserts SC1 (equivalent NormalizedTurn across envelopes), SC2 (identical
`count_drift.analyze` verdict across envelopes), SC3 (tool-call name lists match;
fabrication fixture → `tool_calls == []` everywhere). Bash/py harness, exits 0/1.

### Task 4: `providers/CONTRACT.md` + `README.md`
DoD: schema, mapping table, adapter conformance bar, plug-in instructions for @yurukusa's
adapter, flagged uncertainties.

## 7. Verification

- SC1/SC2/SC3 → `python3 providers/conformance_test.py` exits 0.
- SC2 specifically → the count-drift positive yields `block` on Claude/OpenAI-chat/
OpenAI-responses/Kimi envelopes; the negative yields `pass` on all (printed).
- SC4 → inspection: no imports beyond stdlib; test runs with no network.
- SC6 → bundled-plugin smoke + stress CI green on the PR.

## 8. Rollback Plan

1. Isolated on `feature/providers-shim`; nothing on `main` until merge.
2. Purely additive under `providers/` — no existing file changed; deleting the dir is a
complete rollback.
3. `git revert <sha>` post-merge; `git branch -D feature/providers-shim` pre-merge.
4. Verify rollback: bundled-plugin smoke passes; `ls providers/` absent.

## Source ledger (deepresearch, accessed 2026-05-26)

- OpenAI Chat Completions / Responses tool-call shapes — live-verified, official OpenAI docs.
- Kimi/Moonshot tool calls — live-verified OpenAI-compatible with `builtin_function` +
semantic-id + `reasoning_content` quirks.
- Claude Code hook input (`hook_event_name`, `stop_hook_active`, PreToolUse
`tool_name`/`tool_input`/`tool_use_id`) — live-verified, official hooks doc.
- FLAGGED uncertain: `last_assistant_message` in Stop/SubagentStop (version-gated per
community impls; not on the official page fetched) → transcript-parse fallback;
`Task` tool_input key names → insufficient_data → treated as opaque.
84 changes: 84 additions & 0 deletions providers/conformance_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
#!/usr/bin/env python3
"""providers/conformance_test.py — provider-invariance conformance.

Asserts that every supported envelope normalizes the same logical turn to the same
`NormalizedTurn`, that an existing detector (`lib/count_drift.py`) returns an IDENTICAL
verdict across envelopes (the substrate-level provider-invariance property), and that a
dispatch claimed in text but never emitted as a call yields `tool_calls == []` on every
envelope (the dispatch-fabrication signal). Pure stdlib, no network. Exit 0/1.
"""
import importlib.util
import os
import sys

HERE = os.path.dirname(os.path.abspath(__file__))
ROOT = os.path.abspath(os.path.join(HERE, ".."))


def _load(name, path):
spec = importlib.util.spec_from_file_location(name, path)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
return mod


norm = _load("normalize", os.path.join(HERE, "normalize.py"))
fx = _load("fixtures", os.path.join(HERE, "fixtures.py"))
cd = _load("count_drift", os.path.join(ROOT, "lib", "count_drift.py"))

PASS = 0
FAIL = 0
FAILS = []


def check(cond, desc):
global PASS, FAIL
if cond:
PASS += 1
print(" PASS %s" % desc)
else:
FAIL += 1
FAILS.append(desc)
print(" FAIL %s" % desc)


for fid, spec in fx.FIXTURES.items():
logical = spec["logical"]
norms = {p: norm.ADAPTERS[p](raw) for p, raw in spec["envelopes"].items()}
n_env = len(norms)
# SC1: assistant_message identical across envelopes and equal to the logical text.
check(all(n.assistant_message == logical["text"] for n in norms.values()),
"[%s] assistant_message identical across %d envelopes" % (fid, n_env))
# SC3: tool-call names identical across envelopes and equal to the logical set.
check(all(n.tool_names() == logical["tool_names"] for n in norms.values()),
"[%s] tool_calls identical across envelopes (== %s)" % (fid, logical["tool_names"]))
# SC2: count_drift verdict identical across envelopes.
verdicts = {p: cd.analyze(n.assistant_message)["decision"] for p, n in norms.items()}
check(len(set(verdicts.values())) == 1,
"[%s] count_drift verdict identical across envelopes (%s)"
% (fid, sorted(set(verdicts.values()))))

# SC2 sharper: the positive blocks and the negative passes, each through a NON-Claude envelope.
pos = norm.from_openai_chat(fx.FIXTURES["countdrift_positive"]["envelopes"]["openai_chat"])
neg = norm.from_kimi(fx.FIXTURES["clean_negative"]["envelopes"]["kimi"])
check(cd.analyze(pos.assistant_message)["decision"] == "block",
"count-drift positive -> block via OpenAI envelope")
check(cd.analyze(neg.assistant_message)["decision"] == "pass",
"clean negative -> pass via Kimi envelope")

# Fabrication: claimed dispatch, zero tool calls on every envelope (Kimi builtin echo skipped).
fab = fx.FIXTURES["dispatch_fabricated"]["envelopes"]
check(all(norm.ADAPTERS[p](raw).tool_calls == [] for p, raw in fab.items()),
"dispatch fabrication: tool_calls==[] on every envelope (the fabrication signal)")

# Kimi reasoning_content must not leak into the answer text.
kimi_pos = norm.from_kimi(fx.FIXTURES["countdrift_positive"]["envelopes"]["kimi"])
check("internal CoT" not in kimi_pos.assistant_message,
"kimi reasoning_content excluded from assistant_message")

print("\nPASS=%d FAIL=%d" % (PASS, FAIL))
if FAIL:
print("FAILURES: " + "; ".join(FAILS))
sys.exit(1)
print("ALL CONFORMANCE CHECKS PASSED")
sys.exit(0)
Loading
Loading