Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CLAIM_LEDGER.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Claim Ledger

Access date: 2026-05-13.
Access date: 2026-05-31.

| Claim | Status | Action |
|---|---|---|
Expand All @@ -24,3 +24,5 @@ Access date: 2026-05-13.
| `agentcloseout.env` is safe configuration because it is local. | corrected | The adapter now parses only allowlisted keys instead of shell-sourcing the file; tamper guard blocks normal Claude Code edits to hook/env/rule/engine surfaces. |
| Evidence markers prove completion. | corrected | Evidence markers only satisfy a closeout-text contract. They do not prove command success, deployment correctness, or release readiness; external verification remains required. |
| `Changed files:` or `checked` is enough evidence for implementation closeout. | dropped | v0.2 engine hardening requires concrete command, verification, trace, or read-only-audit evidence; weak markers and explicit missing evidence block completion claims. |
| The field-observation lane's 80 contributed records are released benchmark data. | conditional | They are `field_observation_intake/` candidate records with `label_final=null`; not promoted to a released `data/` lane until two human annotation passes + adjudication. |
| Contributed real-session records can be ingested on the contributor's say-so. | corrected | The 42 ingested only after explicit contributor redaction sign-off (2026-05-31); the 38 only after a verification-clean pass + identifier generalization. Attribution to `@nvst18` is retained per contribution terms; provenance is in `field_observation_intake/source_registry.json` + `derived_fixture_manifest.jsonl`. |
17 changes: 17 additions & 0 deletions DATASET_CARD.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,23 @@ The v0.3 public-derived lane uses `public_data_intake/source_registry.json` and
license decision, privacy status, transform, reviewer, and release eligibility.
Raw public trace rows are not persisted by default.

The **field-observation lane** (`field_observation_intake/`) adds 80 attributed
real-session candidate records contributed by `@nvst18` (Nofyah), ingested
2026-05-31 after contributor sign-off
([`ianymu/recognition-without-arrest#2`](https://github.com/ianymu/recognition-without-arrest/pull/2)),
as two separately-tagged sub-batches: **42 dispatch-fabrication** records
(`category=sycophancy`; real healthcare-deployment transcript, maintainer-redacted
and contributor-signed-off) and **38 hollow-code** records
(`category=hollow_code`, incl. `safety_prompt_bypass`; verification-clean, with
internal class/method identifiers generalized to role-preserving surrogates per
the contributor's request). The lane uses
`field_observation_intake/source_registry.json` and
`derived_fixture_manifest.jsonl` for provenance, and the `subcategory` /
`fix_description` schema extensions in `field_observation_intake/LANE_SCHEMA.md`.
These are **candidate** records: `label_final` is `null` pending two human
annotation passes + adjudication, and the lane is not promoted into a released
`data/` lane until then.

Future records may be:

- synthetically generated adversarial positives;
Expand Down
72 changes: 72 additions & 0 deletions field_observation_intake/LANE_SCHEMA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Field-Observation Lane Record Schema

A `field_observation` record is a base AgentCloseoutBench release-candidate
record (see `SPEC.md` §3 Record Schema) plus the field-observation extensions
below. It is staged under `field_observation_intake/` and is **not** part of the
released `data/` lanes until license, PII/redaction review, contributor
sign-off, and human adjudication are complete.

## Base record (unchanged from SPEC.md §3)

`id`, `category`, `label_candidate`, `label_final`, `task_type`,
`task_description`, `session_summary`, `closeout_text`, `generation_method`,
`model`, `temperature`, `prompt_hash`, `source_provenance`, `license_source`,
`split`, `created_at`, `notes`, plus the status fields used elsewhere in `data/`
(`adjudication_status`, `annotation_status`, `pii_review_status`,
`redaction_status`, `source_type`, `source_id`, `source_url`).

For field observations:

- `source_type` = `"field_observation"`.
- `generation_method` = `"contributed_real_session"`.
- `label_final` stays `null` until two independent human annotation passes plus
adjudication, per the repo annotation contract.
- `redaction_status` and `pii_review_status` are per-record and must reflect the
source's registry entry (e.g. `redacted_pending_signoff`,
`verified_clean_no_pii`).
- `source_id` references a `field_observation_intake/source_registry.json` entry.

## Extensions (new for this lane)

| Field | Type | Required | Definition |
|---|---|---|---|
| `subcategory` | string | yes | Finer-grained failure type within `category`. `hollow_code` alone is too broad for detector training. Examples: `dispatch_fabrication`, `safety_prompt_bypass`, `phantom_completion`. |
| `fix_description` | string \| null | yes | Ground-truth remediation / fix provenance for the documented failure, where known. Real provenance (e.g. a commit ref such as `27af4ec`), not a synthesized suggestion. `null` when no fix was recorded. |
| `contributor` | string | yes | Attributed contributor handle (e.g. `nvst18`). Attribution is part of the contribution terms. |
| `contributed_via` | string (URL) | yes | The PR/issue thread the record was contributed through. |

## Why the extensions

- **`subcategory`** keeps the lane usable for detector training: a flat
`hollow_code` label cannot distinguish a `safety_prompt_bypass` (crisis
detection skipped) from a generic empty handler.
- **`fix_description`** is genuine provenance from the contributor's own session
(what the correct fix was), which is exactly the signal a closeout-honesty
benchmark wants to preserve and which synthetic templates cannot supply.

## Validation expectations (for the ingest PR, not this scaffold)

- Every record's `source_id` resolves to a registry entry whose
`release_eligibility` is no longer `blocked_pending_contributor_signoff`.
- No record contains secrets, emails, absolute paths, usernames, hostnames, repo
URLs, or raw tool-output markers (same reject-to-quarantine rule as
`public_data_intake/`).
- `subcategory` and `fix_description` are present on every record;
`fix_description` may be `null` but the key must exist.

## Ingest notes (2026-05-31)

- `prompt_hash` is **nullable** for this lane. The 42 lot carries a real
`prompt_hash`; the contributed 38 lot has none, so its records carry
`prompt_hash: null`.
- `generation_method` is set to `"contributed_real_session"` on ingest for both
lots (overriding any contributed value), per the base-record rule above.
- `label_final` is forced to `null` on ingest; the contributed `label_candidate`
is retained as a candidate signal only.
- The 38 lot's internal class/method/const/metric identifiers were generalized
to **role-preserving surrogates** per the contributor's sign-off decision
(e.g. `NaomiGuardrail` → `ContentGuardrail`, `getCrisisText` → `getSafetyText`,
`ScorePtgiWeekly` → `ScoreWeekly`). Syntax is kept valid so the fixtures stay
usable for detector training, and the failure signal (incl. the
`safety_prompt_bypass` `->body` vs `getSystemPrompt()` contrast) is preserved.
The 42 lot is ingested verbatim from the signed-off redacted candidate.
92 changes: 92 additions & 0 deletions field_observation_intake/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Field-Observation Intake

This lane turns externally-contributed **real-session field observations** into
deterministic closeout-engine fixtures, under the same discipline as
`public_data_intake/`: no contributor data becomes public corpus data without a
per-source license entry, PII/redaction review, manifest provenance, and
explicit contributor sign-off.

Field observations differ from `public_data_intake/` in one way: the source is a
named collaborator's own session evidence, contributed and attributed by that
collaborator, rather than a published public dataset. The bar for release is
therefore *higher*, not lower — contributor consent and a redaction sign-off are
required release gates.

## Status: INGESTED (2026-05-31)

Both contributor gates cleared on 2026-05-31 (`@nvst18` sign-off on
[`ianymu/recognition-without-arrest#2`](https://github.com/ianymu/recognition-without-arrest/pull/2)),
and both lots from `@nvst18` (Nofyah) are now staged as candidate records under
`candidate_field_observation/`, with one `derived_fixture_manifest.jsonl` row per
record. `label_final` stays `null` pending two annotation passes + adjudication.

- **42 dispatch-fabrication fixtures** (healthcare deployment; phantom dispatches
+ hollow code) — contributed via
[`ianymu/recognition-without-arrest#2`](https://github.com/ianymu/recognition-without-arrest/pull/2).
**Redaction signed off** by the contributor; ingested verbatim from the
redacted candidate (agent codenames + dispatch phrasing preserved as the
fabrication signal, per contributor approval). →
`candidate_field_observation/dispatch_fabrication_42/fixtures.jsonl`
- **38 semantic hollow-code fixtures** (incl. `safety_prompt_bypass`) —
contributed via
[`waitdeadai/llm-dark-patterns#6`](https://github.com/waitdeadai/llm-dark-patterns/issues/6).
Verification pass clean; **internal class/method names generalized to
role-preserving surrogates** per the contributor's sign-off decision (the
failure signal, incl. `safety_prompt_bypass`, is preserved). →
`candidate_field_observation/hollow_code_38/fixtures.jsonl`

Both lots are kept as **separately-tagged sub-batches** (distinct `source_id`,
sub-batch directory, and manifest `transform`) so their provenance never merges:
the 42 is real-transcript-derived + redacted; the 38 is contributed
hollow-code + identifier-generalized.

## Lane contract

**Problem.** Turn two attributed real-session contributions into deterministic
fixtures without materializing any contributor data before consent (now
satisfied).

**In scope.** Source registry, lane record-schema extension, this README, the two
candidate sub-batch payloads, and the derived fixture manifest.

**Out of scope.** Promotion into a released `data/` lane — blocked on the two
human annotation passes + adjudication that flip `label_final` off `null`.

**Privacy gate (satisfied).** The 42 ingested only after contributor redaction
sign-off; the 38 ingested clean + identifier-generalized. A maintainer
reject-to-quarantine scan (emails, absolute paths, repo URLs, secrets, raw
tool-output markers) ran on both sub-batches before commit. The 38 is fully
clean and carries zero original proprietary identifiers; the 42's scan surfaced
only items inside the contributor-approved redaction (short session-id hex, a
generic `github.com/topics` link), surfaced to the contributor for review rather
than altered post-sign-off.

**Acceptance (of this ingest).**
- `source_registry.json` parses; both lots `release_eligibility` cleared off
`blocked_pending_contributor_signoff`.
- Both sub-batch payloads present; `derived_fixture_manifest.jsonl` has one row
per record (80 total).
- Every record carries `subcategory` + `fix_description` (key present, may be
`null`), `contributor`, `contributed_via`, `source_type=field_observation`,
`label_final=null`.
- The 38 carries no original proprietary class/method identifier.

**Rollback.** This lane is additive and self-contained; delete the
`field_observation_intake/` directory to remove it. No existing released lane,
schema, script, or fixture is modified.

## Ingest provenance (done 2026-05-31)

1. ✅ Both gates cleared in-thread (redaction sign-off on the 42; generalize
decision on the 38) — `ianymu/recognition-without-arrest#2`, 2026-05-31.
2. ✅ Added the redacted 42 + the generalized 38 as candidate records under
`candidate_field_observation/`, each carrying `source_type=field_observation`,
`label_candidate`, `label_final=null`, `pii_review_status`,
`redaction_status`, and the `subcategory` / `fix_description` extensions.
3. ✅ Wrote one `derived_fixture_manifest.jsonl` row per record (source id,
source record hash, transform, license decision, reviewer, release
eligibility).
4. ✅ Updated `DATASET_CARD.md` and `CLAIM_LEDGER.md` to register the lane and
its provenance.
5. ⏳ `label_final` stays `null` until two independent human annotation passes
plus adjudication, per the repo annotation contract.
Loading