Skip to content

feat(field-observation): ingest the 42 + 38 attributed @nvst18 lots#20

Open
waitdeadai wants to merge 2 commits into
mainfrom
feature/field-observation-ingest
Open

feat(field-observation): ingest the 42 + 38 attributed @nvst18 lots#20
waitdeadai wants to merge 2 commits into
mainfrom
feature/field-observation-ingest

Conversation

@waitdeadai

Copy link
Copy Markdown
Owner

Ingest the two attributed @nvst18 field-observation lots (42 + 38)

Supersedes the scaffold #19. Both contributor gates cleared on 2026-05-31
(@nvst18 sign-off on
ianymu/recognition-without-arrest#2:
"The 42 (redaction): Approved… The 38: Generalize the internal class/method
names… before ingest. Both lots good to ingest once the placeholders are in."
),
so this materializes the payloads the scaffold was holding for.

What lands

Two separately-tagged sub-batches in one attributed lane (distinct
source_id, sub-batch dir, and manifest transform — provenance never merges):

Sub-batch Records category Handling
dispatch_fabrication_42 42 (39 pos / 3 neg) sycophancy Ingested verbatim from the signed-off redacted candidate. Agent codenames + "Want me to dispatch X" phrasing preserved as the fabrication signal, per your approval.
hollow_code_38 38 (35 pos / 3 neg) hollow_code Internal class/method/const/metric identifiers generalized to role-preserving surrogates (e.g. NaomiGuardrailContentGuardrail, getCrisisTextgetSafetyText, ScorePtgiWeeklyScoreWeekly, ptgi_weeklyscore_weekly).

Plus a derived_fixture_manifest.jsonl (80 rows, one per record, hashing each
original source row), registry/README/LANE_SCHEMA updates, and lane registration
in DATASET_CARD.md + CLAIM_LEDGER.md.

Generalization approach (the 38)

Type-preserving pseudonymization: ~30 distinct identifiers mapped to neutral
surrogates that keep the code syntactically valid (so the fixtures stay
usable for detector training) and preserve the failure signal — e.g. the
safety_prompt_bypass ->body vs getSystemPrompt() contrast survives intact.
A hard residual gate asserts zero original proprietary token (Naomi,
PTGI, Crisis, Therapist, Externalization, NarrativeLetter,
Reflection, Disengagement, and every original class name) survives anywhere.

Two notes for your review @nvst18

  1. No commit SHAs were stripped because there are none in the 38 — every
    fix_description is a short code-fix note (e.g. "Captured $validated and
    used $validated['rating']"), not a SHA. (The 27af4ec ref lives only in the
    registry prose as a fix example.)
  2. The 42 (your approved redaction) is verbatim. A maintainer reject-scan
    surfaced only items inside your sign-off — short session-id hex (e.g.
    a2c8f421 = "Session a2c8f421") and a generic github.com/topics link. I did
    not alter your signed-off content; flagging in case you'd like the
    session-id prefixes stripped before this merges.

Status / safety

  • label_final = null on every record (candidate only; not a released
    data/ lane until two annotation passes + adjudication).
  • Validation run: registry/manifest/payloads parse; validate_corpus.py clean;
    release_workflow_safety_check.py --scan-secrets exit 0 / errors []; pytest
    20 passed; reject-to-quarantine scan clean on both sub-batches.

Attribution to @nvst18 is retained per the contribution terms. Requesting
your review as the contributor before this merges.

🤖 Generated with Claude Code

eliteinterface and others added 2 commits May 28, 2026 15:37
… (held pending sign-off)

Stages field_observation_intake/ (registry + record schema + README) so the 42
dispatch-fabrication and 38 hollow-code contributions from @nvst18 ingest
mechanically once contributor sign-off clears. No payload records: both lots are
release_eligibility=blocked_pending_contributor_signoff. Additive and
self-contained; touches no existing lane, schema, script, or fixture.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both contributor gates cleared 2026-05-31 (nvst18 sign-off on
ianymu/recognition-without-arrest#2): the 42 redaction approved clean,
the 38 internal class/method names generalized to role-preserving surrogates.

- dispatch_fabrication_42: 42 records, verbatim from signed-off redacted candidate
- hollow_code_38: 38 records, identifiers generalized (signal preserved, zero
  original proprietary token survives), prompt_hash null
- derived_fixture_manifest.jsonl: 80 rows; label_final=null pending adjudication
- registry/README/LANE_SCHEMA updated; lane registered in DATASET_CARD + CLAIM_LEDGER

Supersedes the #19 scaffold. Validation: validate_corpus clean, secret-scan
exit 0, pytest 20 passed, reject-scan clean on both sub-batches.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants