feat(f3): bin/pos-selftest.sh — end-to-end selftest of pos plugin (D1/D3/D4/D5/D6) by javiAI · Pull Request #27 · javiAI/project-operating-system

javiAI · 2026-04-26T17:53:39Z

Summary

bin/pos-selftest.sh (wrapper bash mínimo, 9 líneas) + bin/_selftest.py (orquestador stdlib Python, ~297 líneas) ejercitan los gates funcionales-críticos del plugin contra un proyecto sintético generado real-time por npx tsx generator/run.ts --profile cli-tool.yaml.
5 escenarios funcionales-críticos: D1 pre-branch-gate (deny sin marker → allow tras touch <marker>), D3 pre-write-guard (deny Write hooks/foo.py sin test pair → allow tras crear test), D4 pre-pr-gate (deny gh pr create sin docs-sync → allow tras commit ROADMAP+HANDOFF), D5 post-action (git merge confirmado → advisory /pos:compound), D6 stop-policy-check (Stop con session_id rogue deny / clean allow).
CI: nuevo job selftest en .github/workflows/ci.yml (ubuntu × Python 3.11, single matrix). Comando único: pytest bin/tests -q.
Pytest harness: bin/tests/test_selftest_smoke.py (4 tests, contrato del wrapper) + bin/tests/test_selftest_scenarios.py (5 tests, fixture module-scoped).
Sin Claude Code runtime, sin invocaciones reales de skills/agents — cobertura estática queda en agents/tests/test_agent_frontmatter.py y .claude/skills/tests/test_skill_frontmatter.py.

Out of scope (ratificado en Fase -1): D2 session-start + D6 pre-compact (informative-only, sin contrato deny/allow), Claude Code runtime, D5b loader (cubierto indirectamente — los hooks D3/D4/D5 lo consumen y los escenarios sobre-escriben sólo la sección relevante de synthetic/policy.yaml para desacoplar la cobertura de la migración del template).

Suite global post-F3: 829 passed + 1 skipped (vs baseline F2 819 + 1 skip; +10 netos). Sin regresión D1..D6 + E1a..E3b + F1 + F2. Selftest end-to-end local ~1.2s.

Decisiones Fase -1 ratificadas: A1.b (wrapper bash + orquestador Python + smoke pytest), A2 (subset funcional-crítico), A3 (tmpdir + cli-tool + generator real, no fixture committeado), A4 (exit code + tokens, no golden diff), A5 (job en ci.yml, no workflow separado, single matrix), A6 (no Claude runtime).

Ajustes durante implementación: fnmatch no recursa en **/ (corregido generator/**/*.ts → generator/*.ts); docs_sync_rules() requiere ambas docs_sync_required AND docs_sync_conditional (override añade [] para satisfacer); ci-cd.md H3 placement movido a tras item 3 (release.yml) por MD029/MD032 lint.

Drift abierto post-F3: templates/policy.yaml.hbs y generator/renderers/policy.ts siguen emitiendo el shape pre-D5b (cada escenario sobre-escribe la sección que necesita en synthetic/policy.yaml). Reabrir en rama propia post-F3.

Pre-PR simplify pass: extraídos check_deny / check_allow helpers en bin/_selftest.py (5 instancias del patrón, regla #7 CLAUDE.md cumplida). 344 → 297 líneas; sin regresión.

Docs-sync:

ROADMAP.md (F-row 3/4 ramas + § F3 progress block).
HANDOFF.md (§1 snapshot + §9 próxima rama → F4 + §21 estado F3).
MASTER_PLAN.md (§ Rama F3 expandida con decisiones realizadas + ajustes + drift).
docs/ARCHITECTURE.md (§ 10 nueva subsección "Selftest end-to-end (entregado en F3)").
.claude/rules/ci-cd.md (bullet "integración end-to-end" promovido de "Diferidos" a "Aterrizado" + H3 ### Job selftest (entregado en F3)).

Test plan

pytest bin/tests -q — 10 passed (4 smoke + 5 scenarios + 1 GREEN smoke).
pytest hooks/tests .claude/skills/tests agents/tests bin/tests -q — 829 passed + 1 skipped (no regresión).
npm run typecheck — limpio.
CI job selftest espejado del comando local (mismo pytest bin/tests -q).
Verificar verde el job nuevo selftest en GitHub Actions (post-push).

🤖 Generated with Claude Code

Locks down the contract for `bin/pos-selftest.sh` before implementing the wrapper or its Python orchestrator. Five failing tests verify: - `bin/pos-selftest.sh` exists and is executable - `bin/_selftest.py` exists (Python orchestrator) - Wrapper uses `set -euo pipefail` and delegates to `python3 _selftest.py` - Running the wrapper from the repo root exits 0 Scope ratified in Fase -1 (see `.claude/branch-approvals/feat_f3-selftest-end-to-end.approved`): gates funcionales criticos D1 / D3 / D4 / D5 / D6 stop-policy-check; informativos D2 + D6 pre-compact diferidos; sin runtime Claude Code; solo checks estaticos baratos para skills/agents. Following commits: - GREEN minimo wrapper + orchestrator (smoke exit 0) - RED/GREEN incrementales por escenario (D1, D3, D4, D5, D6) - CI job selftest - Docs-sync Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Satisfies the contract locked by RED in the previous commit: - bin/pos-selftest.sh: thin bash wrapper, set -euo pipefail, execs python3 _selftest.py. Both files are executable (mode 0755). - bin/_selftest.py: stdlib orchestrator with empty scenario set (smoke print + return 0). Scenarios D1 / D3 / D4 / D5 / D6 stop are added in subsequent RED/GREEN commits. All 5 smoke contract tests pass. Hooks suite intact (587 passed + 1 skipped baseline preserved). Skills + agents + bin tests: 237 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Locks down the orchestrator contract: each registered scenario must emit `[ok] D{N} {name}` on its line. Module-scoped fixture runs the wrapper once and shares stdout across scenario tests. Fails until _selftest.py registers + implements D1 against the synthetic project. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Orchestrator generates synthetic project per scenario via real `npx tsx generator/run.ts --profile cli-tool.yaml --out <tmp>`, invokes meta-repo hook against synthetic cwd, asserts deny-without-marker + allow-after-touch contract. Selftest runs in ~1.2s end-to-end. Stdlib only (subprocess + tempfile + shutil + json + pathlib). Each scenario gets its own tmpdir to avoid cross-scenario contamination. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Extracts shared diag helper. D3 scenario test fails until the orchestrator registers + implements the pre-write-guard contract (deny Write to enforced path without test pair, allow once test pair exists) against the synthetic project. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Synthetic project's rendered policy.yaml lacks pre_write (template drift documented post-D5b), so the scenario writes a minimal policy override into synthetic/policy.yaml before invoking. Then asserts deny on `Write hooks/foo.py` without test pair, allow once hooks/tests/test_foo.py exists. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Fails until the orchestrator registers + implements the pre-pr-gate contract: deny `gh pr create` when docs-sync (ROADMAP.md + HANDOFF.md) is missing from the diff, allow once docs-sync is satisfied. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Synthetic project's policy.yaml is overridden with a minimal pre_pr section (baseline + empty conditional — loader requires both keys). Init git on main, commit baseline, branch feat/example with a code-only commit, assert deny on `gh pr create`. Add ROADMAP/HANDOFF changes, re-invoke, assert allow. Factors `git_in()` + `init_baseline_repo()` helpers — D5 will reuse them for reflog-based detection. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Fails until the orchestrator registers + implements the post-action contract: a confirmed `git merge` whose diff matches a configured trigger emits the `Consider running /pos:compound` advisory. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Override synthetic policy with minimal post_merge trigger (fnmatch-style non-recursive globs since `**` is literal in fnmatch). Init git on main, branch feat/example, add `generator/feature.ts`, merge --no-ff, then invoke post-action with a `git merge` payload. Asserts exit 0 and `/pos:compound` advisory in stdout. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Fails until the orchestrator registers + implements the stop-policy-check contract: with `skills_allowed` declared in policy.yaml + a rogue invocation in `.claude/logs/skills.jsonl` for the active session, the Stop hook denies exit 2; an unrelated session_id with no invocations allows exit 0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Override synthetic policy.yaml with `skills_allowed: ["pos:simplify"]`, seed `.claude/logs/skills.jsonl` with a rogue invocation under session_id `sess-rogue`. Deny phase: Stop payload `{session_id: "sess-rogue"}` triggers exit 2 deny. Allow phase: a different session_id with no recorded invocations passes through with exit 0. Locks down the session-scoping contract end-to-end. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

`selftest` job runs `pytest bin/tests -q` on ubuntu × Python 3.11 with Node setup (for `npx tsx generator/run.ts`). Covers smoke wrapper + 5 functional-critical scenarios end-to-end. Move integration bullet from "Diferidos" to "Aterrizado" per the invariant in `.claude/rules/ci-cd.md`. Add a dedicated H3 documenting the job's scope, what it covers, what it explicitly does not (D2 informative, Claude Code runtime), and the synthetic-policy drift. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…chestrator Each of the 5 scenarios in bin/_selftest.py repeated the same deny phase (exit 2 + permissionDecision deny check) and allow phase (exit 0 check) boilerplate. Extracting two small helpers (check_deny, check_allow) removes ~30 lines of duplication and makes scenario intent more readable without hiding what each scenario asserts. Pre-PR simplify pass (CLAUDE.md regla #7 satisfied: 5 instances). 829 passed + 1 skipped — no regression. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Close F3 in the standard docs-sync surfaces: - ROADMAP.md: F-row 3/4 ramas, F3 row → ✅, full progress block under § F. - HANDOFF.md: §1 snapshot updated (F3 closed, F4 next), §9 next-branch pointer flipped to F4, new §21 with F3 state (entregables, escenarios, out-of-scope, ajustes). - MASTER_PLAN.md § Rama F3: stub → realized decisions (A1.b shape, A2 functional-critical subset, A3 tmpdir + cli-tool, A4 exit + tokens, A5 single-matrix CI job, A6 no Claude runtime), implementation adjustments documented (fnmatch literal vs recursive, docs_sync_rules double-key contract, ci-cd.md H3 placement), drift open post-F3. - docs/ARCHITECTURE.md § 10: new "Selftest end-to-end (entregado en F3)" subsection inside Testing — tres niveles. Documents wrapper + orchestrator + scenarios + CI + drift. Pre-PR gate (D4 dogfooding) satisfied: ROADMAP + HANDOFF in diff; no conditional triggers apply (bin/** + .github/** outside the rules). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Adds an end-to-end “selftest” harness that generates a synthetic project with the real generator and runs several functional-critical hooks against it, wiring this into CI and documenting the new testing layer.

Changes:

Introduces bin/pos-selftest.sh + bin/_selftest.py to generate a synthetic repo and run D1/D3/D4/D5/D6 gate scenarios.
Adds a pytest harness under bin/tests/ (smoke contract + scenario assertions).
Wires the selftest into GitHub Actions (selftest job) and updates docs (ARCHITECTURE/ROADMAP/MASTER_PLAN/HANDOFF + CI/CD rules).

Reviewed changes

Copilot reviewed 6 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
docs/ARCHITECTURE.md	Documents the new selftest testing layer and its scope.
bin/pos-selftest.sh	Adds a stable bash entrypoint delegating to the Python orchestrator.
bin/_selftest.py	Implements scenario orchestration: generate synthetic repo, override policy sections, invoke real hooks, assert outcomes.
bin/tests/test_selftest_smoke.py	Smoke tests for wrapper presence/shape + wrapper execution contract.
bin/tests/test_selftest_scenarios.py	Scenario-level contract assertions against orchestrator stdout.
ROADMAP.md	Updates phase tracking and records F3 deliverables.
MASTER_PLAN.md	Expands F3 scope/results documentation.
HANDOFF.md	Updates snapshot/next-branch status and F3 summary.
.github/workflows/ci.yml	Adds a dedicated `selftest` job running `pytest bin/tests -q`.
.claude/rules/ci-cd.md	Updates CI/CD rules to reflect the new selftest job and scope.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-26T17:58:20Z

+Entregables:
+
+- `bin/pos-selftest.sh` (9 líneas) — wrapper bash mínimo (`#!/usr/bin/env bash` + `set -euo pipefail` + delega a `python3 bin/_selftest.py`). No contiene lógica; es entrypoint estable que tests + CI consumen sin dependencia de path absoluto.
+- `bin/_selftest.py` (~344 líneas) — orquestador stdlib (sin dependencias externas). Por cada escenario: crea un tmpdir, ejecuta `npx tsx generator/run.ts --profile questionnaire/profiles/cli-tool.yaml --out <tmpdir>` para generar un proyecto sintético, sobre-escribe `synthetic/policy.yaml` con la sección mínima que el escenario necesita, monta el repo sintético como git repo (`git init -b main` + commit baseline), e invoca el hook real (`hooks/<name>.py`) vía subprocess con payload JSON. Asserta exit code + presencia de tokens en stdout/stderr/files. Imprime `[ok] D{N} {name}` o `[fail] D{N} {name}: <diag>`. Exit 0/1 según pass/fail.


The docs pin an approximate line count for bin/_selftest.py ("~344 líneas"), but the file currently differs and will naturally drift over time. To avoid docs becoming stale, consider removing the line count or describing it more generically (e.g., "~300 líneas" / "~320 líneas" / "~few hundred lines").

Suggested change

- `bin/_selftest.py` (~344 líneas) — orquestador stdlib (sin dependencias externas). Por cada escenario: crea un tmpdir, ejecuta `npx tsx generator/run.ts --profile questionnaire/profiles/cli-tool.yaml --out <tmpdir>` para generar un proyecto sintético, sobre-escribe `synthetic/policy.yaml` con la sección mínima que el escenario necesita, monta el repo sintético como git repo (`git init -b main` + commit baseline), e invoca el hook real (`hooks/<name>.py`) vía subprocess con payload JSON. Asserta exit code + presencia de tokens en stdout/stderr/files. Imprime `[ok] D{N} {name}` o `[fail] D{N} {name}: <diag>`. Exit 0/1 según pass/fail.

- `bin/_selftest.py` (unos pocos cientos de líneas) — orquestador stdlib (sin dependencias externas). Por cada escenario: crea un tmpdir, ejecuta `npx tsx generator/run.ts --profile questionnaire/profiles/cli-tool.yaml --out <tmpdir>` para generar un proyecto sintético, sobre-escribe `synthetic/policy.yaml` con la sección mínima que el escenario necesita, monta el repo sintético como git repo (`git init -b main` + commit baseline), e invoca el hook real (`hooks/<name>.py`) vía subprocess con payload JSON. Asserta exit code + presencia de tokens en stdout/stderr/files. Imprime `[ok] D{N} {name}` o `[fail] D{N} {name}: <diag>`. Exit 0/1 según pass/fail.

Copilot · 2026-04-26T17:58:21Z

+**Archivos entregados**:
+
+- `bin/pos-selftest.sh` — wrapper bash (`#!/usr/bin/env bash` + `set -euo pipefail` + delega a `python3 bin/_selftest.py`). 9 líneas. Sin lógica.
+- `bin/_selftest.py` — orquestador Python stdlib (~344 líneas). Por escenario: tmpdir + generator real + sobre-escribe sección mínima de `synthetic/policy.yaml` + monta git repo (`git init -b main` + commit baseline) + invoca hook real vía subprocess + asserta exit + tokens.


This section hardcodes an approximate line count for bin/_selftest.py ("~344 líneas"). Since the file length changes as scenarios evolve, this is likely to go stale; consider removing the line count or making it less specific.

Suggested change

- `bin/_selftest.py` — orquestador Python stdlib (~344 líneas). Por escenario: tmpdir + generator real + sobre-escribe sección mínima de `synthetic/policy.yaml` + monta git repo (`git init -b main` + commit baseline) + invoca hook real vía subprocess + asserta exit + tokens.

- `bin/_selftest.py` — orquestador Python stdlib. Por escenario: tmpdir + generator real + sobre-escribe sección mínima de `synthetic/policy.yaml` + monta git repo (`git init -b main` + commit baseline) + invoca hook real vía subprocess + asserta exit + tokens.

Copilot · 2026-04-26T17:58:21Z

+**Entregables**:
+
+- `bin/pos-selftest.sh` (9 líneas) — wrapper bash mínimo (`#!/usr/bin/env bash` + `set -euo pipefail` + delega a `python3 bin/_selftest.py`). Sin lógica; entrypoint estable.
+- `bin/_selftest.py` (~344 líneas) — orquestador stdlib Python. Por escenario: tmpdir + `npx tsx generator/run.ts --profile cli-tool.yaml --out <tmpdir>` para generar proyecto sintético, sobre-escribe la sección mínima de `synthetic/policy.yaml` que el escenario necesita, monta el sintético como git repo (`git init -b main` + commit baseline), invoca el hook real (`hooks/<name>.py`) vía subprocess con payload JSON, asserta exit + tokens en stdout/stderr/files. Imprime `[ok] D{N} {name}` o `[fail] D{N} {name}: <diag>`.


The snapshot notes bin/_selftest.py as "~344 líneas", but this value will drift as the orchestrator changes. Consider dropping the line count (or making it intentionally coarse) to keep the docs accurate long-term.

Suggested change

- `bin/_selftest.py` (~344 líneas) — orquestador stdlib Python. Por escenario: tmpdir + `npx tsx generator/run.ts --profile cli-tool.yaml --out <tmpdir>` para generar proyecto sintético, sobre-escribe la sección mínima de `synthetic/policy.yaml` que el escenario necesita, monta el sintético como git repo (`git init -b main` + commit baseline), invoca el hook real (`hooks/<name>.py`) vía subprocess con payload JSON, asserta exit + tokens en stdout/stderr/files. Imprime `[ok] D{N} {name}` o `[fail] D{N} {name}: <diag>`.

- `bin/_selftest.py` (orquestador stdlib Python) — por escenario: tmpdir + `npx tsx generator/run.ts --profile cli-tool.yaml --out <tmpdir>` para generar proyecto sintético, sobre-escribe la sección mínima de `synthetic/policy.yaml` que el escenario necesita, monta el sintético como git repo (`git init -b main` + commit baseline), invoca el hook real (`hooks/<name>.py`) vía subprocess con payload JSON, asserta exit + tokens en stdout/stderr/files. Imprime `[ok] D{N} {name}` o `[fail] D{N} {name}: <diag>`.

…3 drift F3 documented an open drift: templates/policy.yaml.hbs + generator/renderers/policy.ts still emit pre-D5b shape, evaded in F3 via per-scenario overlays in bin/_selftest.py. Open the stub branch slot now so the carry-over has a concrete home: - MASTER_PLAN.md § Rama F3b — full stub (scope, contexto a leer, decisiones a cerrar en Fase -1, criterio de salida, rationale de no entregarlo en F3). Position F3b mirrors precedent of refactor/d5-policy-loader as Rama D5b. - ROADMAP.md — new row between F3 and F4. F3 row updated with PR #27. - HANDOFF.md § 7 Gotchas — drift bullet now points to the stub instead of saying "diferida a rama propia post-D6". No code changes. Stub is informational; Fase -1 happens when F4 closes (or when a consumer requires post-D5b shape in generator output).

Copilot flagged "~344 líneas" hardcoded for bin/_selftest.py in 3 docs (ROADMAP § F3, HANDOFF §21, MASTER_PLAN § Rama F3). Already stale: the file is 321 lines after the simplify pass (bb146bc), and will keep drifting as scenarios evolve. Drop the precise count entirely — the description carries the magnitude signal without the maintenance debt. Wrapper count ("9 líneas" for bin/pos-selftest.sh) kept: minimal-by-design, stable as long as the wrapper just exec's the orchestrator. Triage: 3 inline comments, all FIX (low value / trivial effort). 0 issue/conversation comments. Review-body was the PR overview summary.

javiAI · 2026-04-26T18:15:11Z

Address review (1/1 reviewer = Copilot, 3 inline comments, all same suggestion).

FIX (3) — commit 4fbffe1:

Dropped hardcoded ~344 lineas for bin/_selftest.py in ROADMAP, HANDOFF, MASTER_PLAN. Already stale (file is 321 lines after the simplify pass bb146bc), naturally drifts with scenarios. Description carries magnitude signal without the maintenance debt.

KEPT — wrapper bin/pos-selftest.sh at 9 lines: stable-by-design (no logic, just exec).

SKIP / DISCUSS: none. Review-body was the PR overview summary. 0 issue comments.

Javier and others added 15 commits April 26, 2026 16:52

Copilot AI review requested due to automatic review settings April 26, 2026 17:53

Copilot started reviewing on behalf of javiAI April 26, 2026 17:54 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

Javier added 2 commits April 26, 2026 20:00

javiAI merged commit 2595bf9 into main Apr 26, 2026
7 checks passed

javiAI deleted the feat/f3-selftest-end-to-end branch April 26, 2026 18:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(f3): bin/pos-selftest.sh — end-to-end selftest of pos plugin (D1/D3/D4/D5/D6)#27

feat(f3): bin/pos-selftest.sh — end-to-end selftest of pos plugin (D1/D3/D4/D5/D6)#27
javiAI merged 17 commits into
mainfrom
feat/f3-selftest-end-to-end

javiAI commented Apr 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

javiAI commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	- `bin/_selftest.py` — orquestador Python stdlib (~344 líneas). Por escenario: tmpdir + generator real + sobre-escribe sección mínima de `synthetic/policy.yaml` + monta git repo (`git init -b main` + commit baseline) + invoca hook real vía subprocess + asserta exit + tokens.
	- `bin/_selftest.py` — orquestador Python stdlib. Por escenario: tmpdir + generator real + sobre-escribe sección mínima de `synthetic/policy.yaml` + monta git repo (`git init -b main` + commit baseline) + invoca hook real vía subprocess + asserta exit + tokens.

Conversation

javiAI commented Apr 26, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

javiAI commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants