feat(f3): bin/pos-selftest.sh — end-to-end selftest of pos plugin (D1/D3/D4/D5/D6)#27
Conversation
Locks down the contract for `bin/pos-selftest.sh` before implementing the wrapper or its Python orchestrator. Five failing tests verify: - `bin/pos-selftest.sh` exists and is executable - `bin/_selftest.py` exists (Python orchestrator) - Wrapper uses `set -euo pipefail` and delegates to `python3 _selftest.py` - Running the wrapper from the repo root exits 0 Scope ratified in Fase -1 (see `.claude/branch-approvals/feat_f3-selftest-end-to-end.approved`): gates funcionales criticos D1 / D3 / D4 / D5 / D6 stop-policy-check; informativos D2 + D6 pre-compact diferidos; sin runtime Claude Code; solo checks estaticos baratos para skills/agents. Following commits: - GREEN minimo wrapper + orchestrator (smoke exit 0) - RED/GREEN incrementales por escenario (D1, D3, D4, D5, D6) - CI job selftest - Docs-sync Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Satisfies the contract locked by RED in the previous commit: - bin/pos-selftest.sh: thin bash wrapper, set -euo pipefail, execs python3 _selftest.py. Both files are executable (mode 0755). - bin/_selftest.py: stdlib orchestrator with empty scenario set (smoke print + return 0). Scenarios D1 / D3 / D4 / D5 / D6 stop are added in subsequent RED/GREEN commits. All 5 smoke contract tests pass. Hooks suite intact (587 passed + 1 skipped baseline preserved). Skills + agents + bin tests: 237 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Locks down the orchestrator contract: each registered scenario must emit
`[ok] D{N} {name}` on its line. Module-scoped fixture runs the wrapper
once and shares stdout across scenario tests. Fails until _selftest.py
registers + implements D1 against the synthetic project.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Orchestrator generates synthetic project per scenario via real `npx tsx generator/run.ts --profile cli-tool.yaml --out <tmp>`, invokes meta-repo hook against synthetic cwd, asserts deny-without-marker + allow-after-touch contract. Selftest runs in ~1.2s end-to-end. Stdlib only (subprocess + tempfile + shutil + json + pathlib). Each scenario gets its own tmpdir to avoid cross-scenario contamination. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extracts shared diag helper. D3 scenario test fails until the orchestrator registers + implements the pre-write-guard contract (deny Write to enforced path without test pair, allow once test pair exists) against the synthetic project. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Synthetic project's rendered policy.yaml lacks pre_write (template drift documented post-D5b), so the scenario writes a minimal policy override into synthetic/policy.yaml before invoking. Then asserts deny on `Write hooks/foo.py` without test pair, allow once hooks/tests/test_foo.py exists. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fails until the orchestrator registers + implements the pre-pr-gate contract: deny `gh pr create` when docs-sync (ROADMAP.md + HANDOFF.md) is missing from the diff, allow once docs-sync is satisfied. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Synthetic project's policy.yaml is overridden with a minimal pre_pr section (baseline + empty conditional — loader requires both keys). Init git on main, commit baseline, branch feat/example with a code-only commit, assert deny on `gh pr create`. Add ROADMAP/HANDOFF changes, re-invoke, assert allow. Factors `git_in()` + `init_baseline_repo()` helpers — D5 will reuse them for reflog-based detection. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fails until the orchestrator registers + implements the post-action contract: a confirmed `git merge` whose diff matches a configured trigger emits the `Consider running /pos:compound` advisory. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Override synthetic policy with minimal post_merge trigger (fnmatch-style non-recursive globs since `**` is literal in fnmatch). Init git on main, branch feat/example, add `generator/feature.ts`, merge --no-ff, then invoke post-action with a `git merge` payload. Asserts exit 0 and `/pos:compound` advisory in stdout. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fails until the orchestrator registers + implements the stop-policy-check contract: with `skills_allowed` declared in policy.yaml + a rogue invocation in `.claude/logs/skills.jsonl` for the active session, the Stop hook denies exit 2; an unrelated session_id with no invocations allows exit 0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Override synthetic policy.yaml with `skills_allowed: ["pos:simplify"]`,
seed `.claude/logs/skills.jsonl` with a rogue invocation under
session_id `sess-rogue`. Deny phase: Stop payload `{session_id:
"sess-rogue"}` triggers exit 2 deny. Allow phase: a different
session_id with no recorded invocations passes through with exit 0.
Locks down the session-scoping contract end-to-end.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`selftest` job runs `pytest bin/tests -q` on ubuntu × Python 3.11 with Node setup (for `npx tsx generator/run.ts`). Covers smoke wrapper + 5 functional-critical scenarios end-to-end. Move integration bullet from "Diferidos" to "Aterrizado" per the invariant in `.claude/rules/ci-cd.md`. Add a dedicated H3 documenting the job's scope, what it covers, what it explicitly does not (D2 informative, Claude Code runtime), and the synthetic-policy drift. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…chestrator Each of the 5 scenarios in bin/_selftest.py repeated the same deny phase (exit 2 + permissionDecision deny check) and allow phase (exit 0 check) boilerplate. Extracting two small helpers (check_deny, check_allow) removes ~30 lines of duplication and makes scenario intent more readable without hiding what each scenario asserts. Pre-PR simplify pass (CLAUDE.md regla #7 satisfied: 5 instances). 829 passed + 1 skipped — no regression. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Close F3 in the standard docs-sync surfaces: - ROADMAP.md: F-row 3/4 ramas, F3 row → ✅, full progress block under § F. - HANDOFF.md: §1 snapshot updated (F3 closed, F4 next), §9 next-branch pointer flipped to F4, new §21 with F3 state (entregables, escenarios, out-of-scope, ajustes). - MASTER_PLAN.md § Rama F3: stub → realized decisions (A1.b shape, A2 functional-critical subset, A3 tmpdir + cli-tool, A4 exit + tokens, A5 single-matrix CI job, A6 no Claude runtime), implementation adjustments documented (fnmatch literal vs recursive, docs_sync_rules double-key contract, ci-cd.md H3 placement), drift open post-F3. - docs/ARCHITECTURE.md § 10: new "Selftest end-to-end (entregado en F3)" subsection inside Testing — tres niveles. Documents wrapper + orchestrator + scenarios + CI + drift. Pre-PR gate (D4 dogfooding) satisfied: ROADMAP + HANDOFF in diff; no conditional triggers apply (bin/** + .github/** outside the rules). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds an end-to-end “selftest” harness that generates a synthetic project with the real generator and runs several functional-critical hooks against it, wiring this into CI and documenting the new testing layer.
Changes:
- Introduces
bin/pos-selftest.sh+bin/_selftest.pyto generate a synthetic repo and run D1/D3/D4/D5/D6 gate scenarios. - Adds a pytest harness under
bin/tests/(smoke contract + scenario assertions). - Wires the selftest into GitHub Actions (
selftestjob) and updates docs (ARCHITECTURE/ROADMAP/MASTER_PLAN/HANDOFF + CI/CD rules).
Reviewed changes
Copilot reviewed 6 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/ARCHITECTURE.md | Documents the new selftest testing layer and its scope. |
| bin/pos-selftest.sh | Adds a stable bash entrypoint delegating to the Python orchestrator. |
| bin/_selftest.py | Implements scenario orchestration: generate synthetic repo, override policy sections, invoke real hooks, assert outcomes. |
| bin/tests/test_selftest_smoke.py | Smoke tests for wrapper presence/shape + wrapper execution contract. |
| bin/tests/test_selftest_scenarios.py | Scenario-level contract assertions against orchestrator stdout. |
| ROADMAP.md | Updates phase tracking and records F3 deliverables. |
| MASTER_PLAN.md | Expands F3 scope/results documentation. |
| HANDOFF.md | Updates snapshot/next-branch status and F3 summary. |
| .github/workflows/ci.yml | Adds a dedicated selftest job running pytest bin/tests -q. |
| .claude/rules/ci-cd.md | Updates CI/CD rules to reflect the new selftest job and scope. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Entregables: | ||
|
|
||
| - `bin/pos-selftest.sh` (9 líneas) — wrapper bash mínimo (`#!/usr/bin/env bash` + `set -euo pipefail` + delega a `python3 bin/_selftest.py`). No contiene lógica; es entrypoint estable que tests + CI consumen sin dependencia de path absoluto. | ||
| - `bin/_selftest.py` (~344 líneas) — orquestador stdlib (sin dependencias externas). Por cada escenario: crea un tmpdir, ejecuta `npx tsx generator/run.ts --profile questionnaire/profiles/cli-tool.yaml --out <tmpdir>` para generar un proyecto sintético, sobre-escribe `synthetic/policy.yaml` con la sección mínima que el escenario necesita, monta el repo sintético como git repo (`git init -b main` + commit baseline), e invoca el hook real (`hooks/<name>.py`) vía subprocess con payload JSON. Asserta exit code + presencia de tokens en stdout/stderr/files. Imprime `[ok] D{N} {name}` o `[fail] D{N} {name}: <diag>`. Exit 0/1 según pass/fail. |
There was a problem hiding this comment.
The docs pin an approximate line count for bin/_selftest.py ("~344 líneas"), but the file currently differs and will naturally drift over time. To avoid docs becoming stale, consider removing the line count or describing it more generically (e.g., "~300 líneas" / "~320 líneas" / "~few hundred lines").
| - `bin/_selftest.py` (~344 líneas) — orquestador stdlib (sin dependencias externas). Por cada escenario: crea un tmpdir, ejecuta `npx tsx generator/run.ts --profile questionnaire/profiles/cli-tool.yaml --out <tmpdir>` para generar un proyecto sintético, sobre-escribe `synthetic/policy.yaml` con la sección mínima que el escenario necesita, monta el repo sintético como git repo (`git init -b main` + commit baseline), e invoca el hook real (`hooks/<name>.py`) vía subprocess con payload JSON. Asserta exit code + presencia de tokens en stdout/stderr/files. Imprime `[ok] D{N} {name}` o `[fail] D{N} {name}: <diag>`. Exit 0/1 según pass/fail. | |
| - `bin/_selftest.py` (unos pocos cientos de líneas) — orquestador stdlib (sin dependencias externas). Por cada escenario: crea un tmpdir, ejecuta `npx tsx generator/run.ts --profile questionnaire/profiles/cli-tool.yaml --out <tmpdir>` para generar un proyecto sintético, sobre-escribe `synthetic/policy.yaml` con la sección mínima que el escenario necesita, monta el repo sintético como git repo (`git init -b main` + commit baseline), e invoca el hook real (`hooks/<name>.py`) vía subprocess con payload JSON. Asserta exit code + presencia de tokens en stdout/stderr/files. Imprime `[ok] D{N} {name}` o `[fail] D{N} {name}: <diag>`. Exit 0/1 según pass/fail. |
| **Archivos entregados**: | ||
|
|
||
| - `bin/pos-selftest.sh` — wrapper bash (`#!/usr/bin/env bash` + `set -euo pipefail` + delega a `python3 bin/_selftest.py`). 9 líneas. Sin lógica. | ||
| - `bin/_selftest.py` — orquestador Python stdlib (~344 líneas). Por escenario: tmpdir + generator real + sobre-escribe sección mínima de `synthetic/policy.yaml` + monta git repo (`git init -b main` + commit baseline) + invoca hook real vía subprocess + asserta exit + tokens. |
There was a problem hiding this comment.
This section hardcodes an approximate line count for bin/_selftest.py ("~344 líneas"). Since the file length changes as scenarios evolve, this is likely to go stale; consider removing the line count or making it less specific.
| - `bin/_selftest.py` — orquestador Python stdlib (~344 líneas). Por escenario: tmpdir + generator real + sobre-escribe sección mínima de `synthetic/policy.yaml` + monta git repo (`git init -b main` + commit baseline) + invoca hook real vía subprocess + asserta exit + tokens. | |
| - `bin/_selftest.py` — orquestador Python stdlib. Por escenario: tmpdir + generator real + sobre-escribe sección mínima de `synthetic/policy.yaml` + monta git repo (`git init -b main` + commit baseline) + invoca hook real vía subprocess + asserta exit + tokens. |
| **Entregables**: | ||
|
|
||
| - `bin/pos-selftest.sh` (9 líneas) — wrapper bash mínimo (`#!/usr/bin/env bash` + `set -euo pipefail` + delega a `python3 bin/_selftest.py`). Sin lógica; entrypoint estable. | ||
| - `bin/_selftest.py` (~344 líneas) — orquestador stdlib Python. Por escenario: tmpdir + `npx tsx generator/run.ts --profile cli-tool.yaml --out <tmpdir>` para generar proyecto sintético, sobre-escribe la sección mínima de `synthetic/policy.yaml` que el escenario necesita, monta el sintético como git repo (`git init -b main` + commit baseline), invoca el hook real (`hooks/<name>.py`) vía subprocess con payload JSON, asserta exit + tokens en stdout/stderr/files. Imprime `[ok] D{N} {name}` o `[fail] D{N} {name}: <diag>`. |
There was a problem hiding this comment.
The snapshot notes bin/_selftest.py as "~344 líneas", but this value will drift as the orchestrator changes. Consider dropping the line count (or making it intentionally coarse) to keep the docs accurate long-term.
| - `bin/_selftest.py` (~344 líneas) — orquestador stdlib Python. Por escenario: tmpdir + `npx tsx generator/run.ts --profile cli-tool.yaml --out <tmpdir>` para generar proyecto sintético, sobre-escribe la sección mínima de `synthetic/policy.yaml` que el escenario necesita, monta el sintético como git repo (`git init -b main` + commit baseline), invoca el hook real (`hooks/<name>.py`) vía subprocess con payload JSON, asserta exit + tokens en stdout/stderr/files. Imprime `[ok] D{N} {name}` o `[fail] D{N} {name}: <diag>`. | |
| - `bin/_selftest.py` (orquestador stdlib Python) — por escenario: tmpdir + `npx tsx generator/run.ts --profile cli-tool.yaml --out <tmpdir>` para generar proyecto sintético, sobre-escribe la sección mínima de `synthetic/policy.yaml` que el escenario necesita, monta el sintético como git repo (`git init -b main` + commit baseline), invoca el hook real (`hooks/<name>.py`) vía subprocess con payload JSON, asserta exit + tokens en stdout/stderr/files. Imprime `[ok] D{N} {name}` o `[fail] D{N} {name}: <diag>`. |
…3 drift F3 documented an open drift: templates/policy.yaml.hbs + generator/renderers/policy.ts still emit pre-D5b shape, evaded in F3 via per-scenario overlays in bin/_selftest.py. Open the stub branch slot now so the carry-over has a concrete home: - MASTER_PLAN.md § Rama F3b — full stub (scope, contexto a leer, decisiones a cerrar en Fase -1, criterio de salida, rationale de no entregarlo en F3). Position F3b mirrors precedent of refactor/d5-policy-loader as Rama D5b. - ROADMAP.md — new row between F3 and F4. F3 row updated with PR #27. - HANDOFF.md § 7 Gotchas — drift bullet now points to the stub instead of saying "diferida a rama propia post-D6". No code changes. Stub is informational; Fase -1 happens when F4 closes (or when a consumer requires post-D5b shape in generator output).
Copilot flagged "~344 líneas" hardcoded for bin/_selftest.py in 3 docs (ROADMAP § F3, HANDOFF §21, MASTER_PLAN § Rama F3). Already stale: the file is 321 lines after the simplify pass (bb146bc), and will keep drifting as scenarios evolve. Drop the precise count entirely — the description carries the magnitude signal without the maintenance debt. Wrapper count ("9 líneas" for bin/pos-selftest.sh) kept: minimal-by-design, stable as long as the wrapper just exec's the orchestrator. Triage: 3 inline comments, all FIX (low value / trivial effort). 0 issue/conversation comments. Review-body was the PR overview summary.
|
Address review (1/1 reviewer = Copilot, 3 inline comments, all same suggestion). FIX (3) — commit 4fbffe1:
KEPT — wrapper SKIP / DISCUSS: none. Review-body was the PR overview summary. 0 issue comments. |
Summary
bin/pos-selftest.sh(wrapper bash mínimo, 9 líneas) +bin/_selftest.py(orquestador stdlib Python, ~297 líneas) ejercitan los gates funcionales-críticos del plugin contra un proyecto sintético generado real-time pornpx tsx generator/run.ts --profile cli-tool.yaml.touch <marker>), D3 pre-write-guard (denyWrite hooks/foo.pysin test pair → allow tras crear test), D4 pre-pr-gate (denygh pr createsin docs-sync → allow tras commit ROADMAP+HANDOFF), D5 post-action (git mergeconfirmado → advisory/pos:compound), D6 stop-policy-check (Stop consession_idrogue deny / clean allow).selftesten.github/workflows/ci.yml(ubuntu × Python 3.11, single matrix). Comando único:pytest bin/tests -q.bin/tests/test_selftest_smoke.py(4 tests, contrato del wrapper) +bin/tests/test_selftest_scenarios.py(5 tests, fixture module-scoped).agents/tests/test_agent_frontmatter.pyy.claude/skills/tests/test_skill_frontmatter.py.Out of scope (ratificado en Fase -1): D2 session-start + D6 pre-compact (informative-only, sin contrato deny/allow), Claude Code runtime, D5b loader (cubierto indirectamente — los hooks D3/D4/D5 lo consumen y los escenarios sobre-escriben sólo la sección relevante de
synthetic/policy.yamlpara desacoplar la cobertura de la migración del template).Suite global post-F3: 829 passed + 1 skipped (vs baseline F2 819 + 1 skip; +10 netos). Sin regresión D1..D6 + E1a..E3b + F1 + F2. Selftest end-to-end local ~1.2s.
Decisiones Fase -1 ratificadas: A1.b (wrapper bash + orquestador Python + smoke pytest), A2 (subset funcional-crítico), A3 (tmpdir + cli-tool + generator real, no fixture committeado), A4 (exit code + tokens, no golden diff), A5 (job en
ci.yml, no workflow separado, single matrix), A6 (no Claude runtime).Ajustes durante implementación:
fnmatchno recursa en**/(corregidogenerator/**/*.ts→generator/*.ts);docs_sync_rules()requiere ambasdocs_sync_requiredANDdocs_sync_conditional(override añade[]para satisfacer); ci-cd.md H3 placement movido a tras item 3 (release.yml) por MD029/MD032 lint.Drift abierto post-F3:
templates/policy.yaml.hbsygenerator/renderers/policy.tssiguen emitiendo el shape pre-D5b (cada escenario sobre-escribe la sección que necesita ensynthetic/policy.yaml). Reabrir en rama propia post-F3.Pre-PR simplify pass: extraídos
check_deny/check_allowhelpers enbin/_selftest.py(5 instancias del patrón, regla #7 CLAUDE.md cumplida). 344 → 297 líneas; sin regresión.Docs-sync:
### Job selftest (entregado en F3)).Test plan
pytest bin/tests -q— 10 passed (4 smoke + 5 scenarios + 1 GREEN smoke).pytest hooks/tests .claude/skills/tests agents/tests bin/tests -q— 829 passed + 1 skipped (no regresión).npm run typecheck— limpio.selftestespejado del comando local (mismopytest bin/tests -q).selftesten GitHub Actions (post-push).🤖 Generated with Claude Code