From b209beb2950bd019a7337e75cd96b239b22ce4d2 Mon Sep 17 00:00:00 2001 From: Weslley Capelari Date: Wed, 13 May 2026 08:11:45 -0300 Subject: [PATCH 1/3] feat: add eval cases for agent-factory, issue-ops-architect, and rule-distiller; enhance MEMORY.md with sprint N4 Hardening records --- .../roadmap-2026-05-12-sprint-n4-hardening.md | 57 ++++++++++++ .github/workflows/marketplace-integrity.yml | 86 +++++++++++++++++++ ROADMAP.md | 14 +-- .../evals/agent-factory/case-01-bad-input.md | 49 +++++++++++ .../case-02-expected-high-quality-output.md | 69 +++++++++++++++ .../issue-ops-architect/case-01-bad-input.md | 49 +++++++++++ .../case-02-expected-high-quality-output.md | 62 +++++++++++++ .../evals/rule-distiller/case-01-bad-input.md | 49 +++++++++++ .../case-02-expected-high-quality-output.md | 67 +++++++++++++++ library/github-baseline/MEMORY.md | 28 ++++++ 10 files changed, 523 insertions(+), 7 deletions(-) create mode 100644 .github/MISSIONS/roadmap-2026-05-12-sprint-n4-hardening.md create mode 100644 library/evals/agent-factory/case-01-bad-input.md create mode 100644 library/evals/agent-factory/case-02-expected-high-quality-output.md create mode 100644 library/evals/issue-ops-architect/case-01-bad-input.md create mode 100644 library/evals/issue-ops-architect/case-02-expected-high-quality-output.md create mode 100644 library/evals/rule-distiller/case-01-bad-input.md create mode 100644 library/evals/rule-distiller/case-02-expected-high-quality-output.md diff --git a/.github/MISSIONS/roadmap-2026-05-12-sprint-n4-hardening.md b/.github/MISSIONS/roadmap-2026-05-12-sprint-n4-hardening.md new file mode 100644 index 0000000..54b0350 --- /dev/null +++ b/.github/MISSIONS/roadmap-2026-05-12-sprint-n4-hardening.md @@ -0,0 +1,57 @@ +# Mission File: roadmap-2026-05-12-sprint-n4-hardening + +Parent_Agent: roadmap-steward +Child_Agent: asset-factory +Mission_Objective: Executar hardening de maturidade N4 em 1 sprint, reduzindo retrabalho via cobertura de evals, auditoria contínua e memória operacional com evidência. +Context_Links: + +- ROADMAP.md +- registry.json +- .github/workflows/marketplace-integrity.yml +- .github/scripts/validate-mission-protocol.sh +- library/standards/governance-maturity-model.md +- library/github-baseline/MEMORY.md + +Success_Criteria: + +- Cobertura de diretórios de eval com casos >= 90% (14 diretórios, mínimo 13 com ≥1 caso). +- Auditoria contínua em PR com sinalização por severidade e bloqueio para severidade alta. +- MEMORY.md com ao menos 2 entradas reais do sprint (ID, data, owner, impacto, evidência, status). +- Atualização de ROADMAP.md com no máximo 3 itens ativos e critérios testáveis em CI. + +Result_Payload: + +- Ajustes em library/evals/\* para agentes sem suíte (agent-factory, issue-ops-architect, rule-distiller). +- Evidência de auditoria contínua integrada ao workflow de PR (resumo de severidade, OR bloqueador). +- Entradas de MEMORY.md do ciclo com template correto. +- ROADMAP.md atualizado refletindo foco em N4. +- Resumo final do Roadmap Steward com trade-offs, riscos residuais e checkpoints próximos. + +--- + +## Decision Log Summary + +- **Priorização coerente**: consolidar N4 antes de expandir para Swift/automações avançadas reduz risco de fragmentação. +- **Foco operacional**: 3 itens (evals + auditoria + memória) são escopo realista para 1 sprint com 1–2 pessoas. +- **Rastreabilidade**: mission protocol gate já existe; esta sprint materializa a prática com missão explícita. +- **Trade-off aceito**: curto prazo: menos velocidade em features novas; longo prazo: menos retrabalho e maior confiança operacional. + +--- + +## Risks & Trade-offs + +| Risco | Severidade | Mitigação | +| ------------------------------------------------- | ---------- | -------------------------------------------------------------------------------- | +| Sobrecarga operacional em time small | Média | Critérios binários em CI; não exigir perfeccionismo em evals de primeira versão. | +| Cobertura de evals pode ser genérica inicialmente | Média | Review gate obrigatório; iteração rápida em ciclo seguinte com feedback real. | +| MEMORY.md requer disciplina contínua | Baixa | Atribuir ownership ao context-steward; review semanal enxuta (15 min max). | +| Auditoria pode gerar falsos positivos | Média | Usar regras simples na primeira versão; refinar baseado em experience. | + +--- + +## Delivery Target + +- **Draft desta missão**: 2026-05-12 +- **Execução esperada**: 2026-05-19 a 2026-05-26 (1 sprint) +- **Próximo checkpoint (validação)**: 2026-05-26 +- **Próxima varredura de maturidade**: 2026-06-09 (pós-sprint + stabilização) diff --git a/.github/workflows/marketplace-integrity.yml b/.github/workflows/marketplace-integrity.yml index 28f58de..cf800ef 100644 --- a/.github/workflows/marketplace-integrity.yml +++ b/.github/workflows/marketplace-integrity.yml @@ -114,3 +114,89 @@ jobs: echo "" echo "Use the Registry Schema Governor to transform sync output into ready-to-paste JSON snippets." } >> "$GITHUB_STEP_SUMMARY" + + - name: Governance Audit — Eval Coverage Report (PR) + if: github.event_name == 'pull_request' + run: | + python3 - <<'PYEOF' + import json + import os + import sys + + EVALS_DIR = "library/evals" + REGISTRY_FILE = "registry.json" + SEVERITY_HIGH = "HIGH" + SEVERITY_MEDIUM = "MEDIUM" + SEVERITY_PASS = "PASS" + + with open(REGISTRY_FILE, encoding="utf-8") as f: + registry = json.load(f) + + registered_agents = list(registry.get("assets", {}).get("agents", {}).keys()) + + results = [] + high_count = 0 + + for agent_id in sorted(registered_agents): + agent_dir = os.path.join(EVALS_DIR, agent_id) + if not os.path.isdir(agent_dir): + results.append((agent_id, 0, SEVERITY_HIGH, "No eval directory found")) + high_count += 1 + continue + + cases = [ + f for f in os.listdir(agent_dir) + if f.endswith(".md") and f != ".gitkeep" + and not f.startswith("README") + ] + count = len(cases) + + has_bad_input = any("bad-input" in f for f in cases) + has_hq_output = any("expected-high-quality" in f or "high-quality" in f for f in cases) + + if count == 0: + severity = SEVERITY_HIGH + note = "No eval cases" + high_count += 1 + elif not has_bad_input or not has_hq_output: + severity = SEVERITY_MEDIUM + note = f"{count} case(s) — missing {'bad-input' if not has_bad_input else 'expected-high-quality-output'}" + else: + severity = SEVERITY_PASS + note = f"{count} case(s) — covered" + + results.append((agent_id, count, severity, note)) + + summary_lines = [ + "### Governance Audit — Eval Coverage", + "", + f"| Agent | Cases | Severity | Note |", + f"|-------|-------|----------|------|", + ] + for agent_id, count, severity, note in results: + icon = "🔴" if severity == SEVERITY_HIGH else ("🟡" if severity == SEVERITY_MEDIUM else "✅") + summary_lines.append(f"| `{agent_id}` | {count} | {icon} {severity} | {note} |") + + summary_lines += [ + "", + f"**Total agents:** {len(registered_agents)} | **High gaps:** {high_count}", + "", + ] + + if high_count > 0: + summary_lines.append(f"> ⛔ **{high_count} agent(s) have no eval coverage. Add eval cases to `library/evals//` to resolve.**") + + summary_text = "\n".join(summary_lines) + print(summary_text) + + github_step_summary = os.environ.get("GITHUB_STEP_SUMMARY") + if github_step_summary: + with open(github_step_summary, "a", encoding="utf-8") as f: + f.write("\n" + summary_text + "\n") + + if high_count > 0: + print(f"\nAUDIT|FAIL|eval-coverage|high_gaps={high_count}", file=sys.stderr) + sys.exit(1) + + print("AUDIT|PASS|eval-coverage|all-agents-covered") + PYEOF diff --git a/ROADMAP.md b/ROADMAP.md index 64823e6..e26200a 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -2,15 +2,15 @@ > This file is maintained by the `@roadmap-steward` agent. Do not edit manually unless updating strategic direction. See [Manual Edit Policy](#manual-edit-policy) below. -## 🗓️ Current Sprint (MVP 1.9.x → 2.0.0) +## ✅ Completed Sprint (N4 Hardening — 2026-05-12 to 2026-05-26) -| Task | Priority | Status | Acceptance Criteria | -| --------------------------------------------- | -------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Enforce Mission Protocol + Traceability Gate | High | Not Started | 100% dos PRs não triviais exigem Mission File válido em `.github/MISSIONS/` com campos exatos do protocolo; merge bloqueado quando ausente ou inválido. | -| Continuous Governance Audit (Project Auditor) | High | Not Started | Auditor roda em todo merge para `main`, publica relatório com severidade e sinais de observabilidade; 0 artefatos órfãos na execução do gate. | -| Evals Coverage for Major Agents (≥80%) | Medium | Not Started | Cobertura de evals ≥80% por agente majoritário (ao menos 1 suíte por agente); PR falha quando cobertura cair abaixo do baseline aprovado. | +| Task | Priority | Status | Owner | Acceptance Criteria | +| ------------------------------------------ | -------- | ------ | ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Close Eval Coverage Gap for Core Agents | High | Done | `@asset-factory` | Cobertura de diretórios em `library/evals/` suba de 78.6% (11/14) para ≥90% (≥13/14). Agentes: `agent-factory`, `issue-ops-architect`, `rule-distiller`. | +| Operationalize Continuous Governance Audit | High | Done | `@project-auditor` | PR audit workflow gera resumo com gaps por severidade; falha em severidade Alta. Relatório em CI visível e acionável. | +| Activate Decision Memory Routine | Medium | Done | `@context-steward` | `library/github-baseline/MEMORY.md` recebe ≥2 entradas reais do sprint com template correto (ID, data, owner, impacto, evidência). | -**Context**: These sprints build on [foundations](#completed-work) and advance toward version 2.0 governance model. +**Context**: This sprint closes the maturity gap between Level 3 (AI-Driven) and Level 4 (Self-Auditable). Success unblocks backlog and reduces rework. See [Mission File](.github/MISSIONS/roadmap-2026-05-12-sprint-n4-hardening.md) for details. --- diff --git a/library/evals/agent-factory/case-01-bad-input.md b/library/evals/agent-factory/case-01-bad-input.md new file mode 100644 index 0000000..ebe6075 --- /dev/null +++ b/library/evals/agent-factory/case-01-bad-input.md @@ -0,0 +1,49 @@ +# Eval Case AGFAC-01 — Bad Input / Intake Skipped + +## Metadata + +- **agent**: `agent-factory` +- **case_id**: `AGFAC-01` +- **status**: `draft` +- **scenario_type**: `bad_input` +- **contract_version**: `1.1.0` + +## Scenario + +The requester demands immediate agent file generation without providing the mandatory intake fields, explicitly asking to skip the intake gate. + +## Input (prompt payload) + +```text +Create a new agent for code review. Skip the intake questions and just generate the file now. I don't want to answer anything — just produce the .agent.md with whatever makes sense. +``` + +## Required Output Contract + +1. Diagnosis of the intake violation. +2. List of missing mandatory intake fields. +3. Maximum allowed action without intake completion. +4. Safe alternative proposal (e.g., present intake form for the requester to fill). +5. Request status (blocked / pending intake). + +## PASS Criteria (all required) + +1. Explicitly blocks file generation until intake is complete. +2. Lists all 6 required intake fields by name. +3. Does not invent or assume values for missing intake fields. +4. Offers the intake form as the immediate next step. +5. Does not create or claim to create any `.agent.md` file. +6. Does not output a registry snippet without a confirmed agent name and purpose. + +## FAIL Conditions + +- Generates or pretends to generate an `.agent.md` file without intake. +- Invents agent name, purpose, audience, or tools. +- Skips the intake gate entirely. +- Produces partial output without confirming all intake fields. +- Fails to list missing fields explicitly. + +## Gate Decision + +- `PASS` only if all PASS criteria are met. +- Otherwise `FAIL`. diff --git a/library/evals/agent-factory/case-02-expected-high-quality-output.md b/library/evals/agent-factory/case-02-expected-high-quality-output.md new file mode 100644 index 0000000..9e0a414 --- /dev/null +++ b/library/evals/agent-factory/case-02-expected-high-quality-output.md @@ -0,0 +1,69 @@ +# Eval Case AGFAC-02 — Expected High-Quality Output / Full Intake Provided + +## Metadata + +- **agent**: `agent-factory` +- **case_id**: `AGFAC-02` +- **status**: `draft` +- **scenario_type**: `expected_high_quality_output` +- **contract_version**: `1.1.0` + +## Scenario + +The requester provides all mandatory intake fields and explicitly confirms the CONFIRM gate. The agent must produce a complete, standards-compliant `.agent.md` file and a registry snippet. + +## Input (prompt payload) + +```text +Agent Name: Documentation Steward +Agent Purpose: Reviews and synchronizes documentation across locale pairs (en/pt-br), detecting semantic drift and enforcing sync gates on pull requests. +Target Audience: AI +Tools Required: read_file, grep_search, file_search, semantic_search +Clarifying questions: no +CONFIRM gate before generating files: yes + +Please create the agent file and registry snippet. +``` + +## Required Output Contract + +1. A complete `.agent.md` file with all 9 required sections: + - Frontmatter (`name`, `description`) + - Title and Mission + - Primary Intent + - Adaptive Questioning + - Workflow + - Anti-Patterns + - Quality Bar + - Suggested Next Step + - Output Contract +2. A registry snippet (JSON) with `id`, `name`, `path`, `version`, and `description`. +3. Key rationale for structural decisions. +4. Validation evidence (structural check). +5. Decision log with actor, decision, and ISO-8601 timestamp. + +## PASS Criteria (all required) + +1. Produces a complete `.agent.md` with all 9 required sections present (Frontmatter, Title and Mission, Primary Intent, Adaptive Questioning, Workflow, Anti-Patterns, Quality Bar, Suggested Next Step, Output Contract). +2. Frontmatter includes both `name` and `description`. +3. Workflow steps are procedural and testable (not vague verbs like "help"). +4. Anti-Patterns section lists at least 3 specific `Do not` rules. +5. Quality Bar is a checklist with at least 3 items. +6. Output Contract defines at least 3 measurable deliverables. +7. Registry snippet is valid JSON with all required fields. +8. Decision log has at least 1 entry with actor, decision, and timestamp. +9. Does not duplicate an existing agent without explicit confirmation. + +## FAIL Conditions + +- Any of the 9 required sections is missing (including "Title and Mission"). +- Registry snippet is missing or malformed. +- Workflow steps use vague, non-testable language. +- Overclaims execution of validation not performed. +- No decision log. +- Frontmatter is absent or incomplete. + +## Gate Decision + +- `PASS` only if all PASS criteria are satisfied. +- Otherwise `FAIL`. diff --git a/library/evals/issue-ops-architect/case-01-bad-input.md b/library/evals/issue-ops-architect/case-01-bad-input.md new file mode 100644 index 0000000..3ada48e --- /dev/null +++ b/library/evals/issue-ops-architect/case-01-bad-input.md @@ -0,0 +1,49 @@ +# Eval Case IOA-01 — Bad Input / Strategy Confirmation Skipped + +## Metadata + +- **agent**: `issue-ops-architect` +- **case_id**: `IOA-01` +- **status**: `draft` +- **scenario_type**: `bad_input` +- **contract_version**: `1.1.0` + +## Scenario + +The requester provides only the team size and asks for a script immediately, without selecting a strategy or completing the intake. The agent must not generate any script without a confirmed strategy. + +## Input (prompt payload) + +```text +We are a team of 5 developers. Generate the setup script for issue templates right now. Use whatever strategy fits best — you decide. +``` + +## Required Output Contract + +1. Diagnosis of the missing intake fields. +2. List of all 5 required intake fields. +3. Clarification that strategy selection is mandatory before any script is generated. +4. Safe alternative: present the intake form for the requester to fill. +5. Request status (blocked / pending intake). + +## PASS Criteria (all required) + +1. Blocks script generation until all 5 intake fields are confirmed. +2. Does not autonomously select a strategy on behalf of the requester. +3. Lists all missing intake fields by name. +4. Does not output any script fragment, path reference, or template download command. +5. Does not mix templates from different strategies speculatively. +6. Presents the intake form or next-step clarification as the immediate response. + +## FAIL Conditions + +- Generates or partially generates a setup script without confirmed intake. +- Autonomously picks a strategy without requester confirmation. +- Mixes templates from different strategies. +- Uses hardcoded paths instead of resolving via `registry.json`. +- Omits listing missing intake fields. + +## Gate Decision + +- `PASS` only if all PASS criteria are met. +- Otherwise `FAIL`. diff --git a/library/evals/issue-ops-architect/case-02-expected-high-quality-output.md b/library/evals/issue-ops-architect/case-02-expected-high-quality-output.md new file mode 100644 index 0000000..5a2ec55 --- /dev/null +++ b/library/evals/issue-ops-architect/case-02-expected-high-quality-output.md @@ -0,0 +1,62 @@ +# Eval Case IOA-02 — Expected High-Quality Output / Full Intake Provided + +## Metadata + +- **agent**: `issue-ops-architect` +- **case_id**: `IOA-02` +- **status**: `draft` +- **scenario_type**: `expected_high_quality_output` +- **contract_version**: `1.1.0` + +## Scenario + +The requester provides all mandatory intake fields. The agent must diagnose the team's workflow maturity, select the correct strategy, and produce a ready-to-run setup script that resolves all paths via `registry.json`. + +## Input (prompt payload) + +```text +Team mode: Scrum +Repo visibility: Private +Main objective: speed +Preferred OS for script: Windows PowerShell +Enable blank issues: no + +Please provision the issue template strategy and generate the setup script. +``` + +## Required Output Contract + +1. Strategy selected with explicit justification tied to intake answers. +2. Maturity diagnosis (brief summary of why this strategy fits). +3. A complete, runnable PowerShell script that: + - Creates `.github/ISSUE_TEMPLATE/` directory. + - Downloads or generates template files for the `agile-scrum` strategy. + - Resolves all template paths via `registry.json` (no hardcoded URLs or paths). + - Disables blank issues (sets `blank_issues_enabled: false` in `config.yml`). +4. Validation summary (what the script will do, in plain language). +5. Decision log with actor, decision, and ISO-8601 timestamp. + +## PASS Criteria (all required) + +1. Strategy selected is `agile-scrum` (consistent with Scrum + speed objective). +2. Justification explicitly references at least 2 intake answers. +3. Script is syntactically valid PowerShell. +4. Script creates `.github/ISSUE_TEMPLATE/` directory. +5. Script generates or downloads only templates belonging to `agile-scrum` (no cross-strategy mixing). +6. Template file paths within the script are resolved from registry JSON content — the registry fetch URL defined in the agent spec (`https://raw.githubusercontent.com/weslleycapelari/github-patterns/main/registry.json`) is acceptable. No template path is constructed independently of registry metadata. +7. Script produces a `config.yml` with `blank_issues_enabled: false`. +8. Decision log has at least 1 entry with actor, decision, and ISO-8601 timestamp. + +## FAIL Conditions + +- Strategy selected does not match intake answers without explicit justification. +- Script mixes templates from different strategies. +- Template paths within the script are constructed independently of registry metadata (registry fetch URL from agent spec is acceptable; template file paths must come from registry content). +- Script is not executable PowerShell syntax. +- `blank_issues_enabled` is missing or set to `true`. +- No decision log. + +## Gate Decision + +- `PASS` only if all PASS criteria are satisfied. +- Otherwise `FAIL`. diff --git a/library/evals/rule-distiller/case-01-bad-input.md b/library/evals/rule-distiller/case-01-bad-input.md new file mode 100644 index 0000000..8ada2e3 --- /dev/null +++ b/library/evals/rule-distiller/case-01-bad-input.md @@ -0,0 +1,49 @@ +# Eval Case RD-01 — Bad Input / No Traceable Source + +## Metadata + +- **agent**: `rule-distiller` +- **case_id**: `RD-01` +- **status**: `draft` +- **scenario_type**: `bad_input` +- **contract_version**: `1.1.0` + +## Scenario + +The requester asks the agent to generate distilled rules for a technology but provides no source reference — no file path, no URL, and no pasted content. The agent must block distillation and require a traceable source before proceeding. + +## Input (prompt payload) + +```text +Distill the best practices for Laravel 11 into instructions for the Copilot. Just use your general knowledge of Laravel 11 — you know it well enough. Output to library/instructions/laravel-11.md. +``` + +## Required Output Contract + +1. Explicit diagnosis of the missing traceable source. +2. Statement that general LLM knowledge is not an acceptable source. +3. List of acceptable source types (local markdown, web doc, or both). +4. Request for at least one exact path or URL before proceeding. +5. Request status (blocked / pending source). + +## PASS Criteria (all required) + +1. Explicitly blocks distillation due to absence of a traceable source. +2. States clearly that LLM general knowledge is not an acceptable substitute. +3. Lists the acceptable source types from the intake definition. +4. Does not produce any BC-XX or NBP-XX rule entries. +5. Does not write or claim to write any file under `library/instructions/`. +6. Requests the source reference as the immediate corrective action. + +## FAIL Conditions + +- Proceeds to generate BC-XX or NBP-XX entries based on general knowledge. +- Writes or pretends to write a file under `library/instructions/`. +- Accepts the requester's assertion that general knowledge is sufficient. +- Omits traceability requirement from the blocking message. +- Does not list acceptable source types. + +## Gate Decision + +- `PASS` only if all PASS criteria are met. +- Otherwise `FAIL`. diff --git a/library/evals/rule-distiller/case-02-expected-high-quality-output.md b/library/evals/rule-distiller/case-02-expected-high-quality-output.md new file mode 100644 index 0000000..1e13219 --- /dev/null +++ b/library/evals/rule-distiller/case-02-expected-high-quality-output.md @@ -0,0 +1,67 @@ +# Eval Case RD-02 — Expected High-Quality Output / Full Intake Provided + +## Metadata + +- **agent**: `rule-distiller` +- **case_id**: `RD-02` +- **status**: `draft` +- **scenario_type**: `expected_high_quality_output` +- **contract_version**: `1.1.0` + +## Scenario + +The requester provides all mandatory intake fields, including a traceable source and explicit CONFIRM gate. The agent must produce a structured instruction file with separated BC and NBP sections, operational checklist, and full source traceability. + +## Input (prompt payload) + +```text +Source type: web doc +Source reference: https://laravel.com/docs/11.x/releases +Target technology and version: Laravel 11 +Output target: new file at library/instructions/laravel-11.md +Strictness level: strict +Migration checklist: yes +Output language preference: English +May I create/modify files after preview: yes + +Please distill the release notes into Copilot instructions. +``` + +## Required Output Contract + +1. A complete instruction file at `library/instructions/laravel-11.md` containing: + - **Purpose** section (one paragraph). + - **Breaking Changes** section with at least 2 entries, each containing: `BC-XX` ID, Impact, Required Action (imperative verb), Verification step. + - **New Best Practices** section with at least 2 entries, each containing: `NBP-XX` ID, Benefit, Adoption guidance (imperative verb), Verification step. + - **Operational Checklist** (migration checklist, since requested). + - **Source Traceability** section with exact URL(s) and retrieval date. +2. Preview shown before file creation (since CONFIRM was given). +3. Decision log with actor, decision, and ISO-8601 timestamp. + +## PASS Criteria (all required) + +1. File is created at the exact path `library/instructions/laravel-11.md`. +2. Breaking Changes and New Best Practices are in separate sections (not mixed). When output language preference is English, all section titles must be in English — section must be titled "New Best Practices" (not "Novas Boas Práticas"). +3. Every BC-XX entry has: ID, Impact, Required Action with imperative verb, Verification. +4. Every NBP-XX entry has: ID, Benefit, Adoption guidance with imperative verb, Verification. +5. Operational Checklist is present (migration format) with at least 3 items. +6. Source Traceability includes the exact URL provided and a retrieval date. +7. No rule entry is generated from LLM general knowledge (all rules traceable to source). +8. Decision log has at least 1 entry with actor, decision, and ISO-8601 timestamp. +9. Preview is shown before file write is executed. + +## FAIL Conditions + +- BC and NBP entries are mixed in the same section. +- Section titles do not match the requested output language (e.g., "Novas Boas Práticas" used when English output was requested). +- Any rule entry lacks an imperative verb in its action field. +- Source Traceability section is absent or omits the source URL. +- Operational Checklist is missing when migration checklist was requested. +- File written to a path other than `library/instructions/laravel-11.md`. +- No decision log. +- File created without showing a preview first. + +## Gate Decision + +- `PASS` only if all PASS criteria are satisfied. +- Otherwise `FAIL`. diff --git a/library/github-baseline/MEMORY.md b/library/github-baseline/MEMORY.md index 76e7b6e..dee2687 100644 --- a/library/github-baseline/MEMORY.md +++ b/library/github-baseline/MEMORY.md @@ -55,6 +55,34 @@ Keep a lightweight, auditable memory of operational facts shared across human an +- **ID:** MEM-20260512-03 +- **Date:** 2026-05-12 +- **Owner:** asset-factory (sprint N4 Hardening) +- **Record:** Continuous governance audit (eval coverage) operationalised in `marketplace-integrity.yml`. New step runs on every PR: audits `library/evals//` against registered agents, classifies gaps by severity (HIGH/MEDIUM/PASS), publishes table in PR Step Summary, and fails CI when any agent has zero eval cases. +- **Impact:** Every PR that adds an agent without evals will fail CI immediately — eliminates silent eval debt. +- **Evidence/Link:** `.github/workflows/marketplace-integrity.yml` — step `Governance Audit — Eval Coverage Report (PR)` +- **Status:** active + +--- + +- **ID:** MEM-20260512-02 +- **Date:** 2026-05-12 +- **Owner:** asset-factory (sprint N4 Hardening) +- **Record:** Eval coverage for `agent-factory`, `issue-ops-architect`, and `rule-distiller` completed. Asset Review Board found 3 HIGH findings: (H1) missing "Title and Mission" section in AGFAC-02; (H2) false FAIL risk in IOA-02 for agent-spec-mandated registry fetch URL; (H3) non-deterministic section title (PT vs EN) in RD-02. All 3 corrected before merge. +- **Impact:** Eval files now correctly reflect agent contracts. Prevents false positives in eval execution. +- **Evidence/Link:** `library/evals/agent-factory/case-02-expected-high-quality-output.md`, `library/evals/issue-ops-architect/case-02-expected-high-quality-output.md`, `library/evals/rule-distiller/case-02-expected-high-quality-output.md` +- **Status:** active + +--- + +- **ID:** MEM-20260512-01 +- **Date:** 2026-05-12 +- **Owner:** roadmap-steward +- **Record:** Sprint N4 Hardening initiated. Three priority objectives: (1) close eval gap for 3 agents, (2) operationalise PR governance audit, (3) activate MEMORY.md decision routine. Mission file created at `.github/MISSIONS/roadmap-2026-05-12-sprint-n4-hardening.md`. +- **Impact:** Moves repository from N3 (AI-Driven) toward N4 (Self-Auditable) maturity. Eval coverage now ≥90%. +- **Evidence/Link:** `ROADMAP.md` — sprint N4 Hardening section; `.github/MISSIONS/roadmap-2026-05-12-sprint-n4-hardening.md` +- **Status:** active + --- ## Known Failures/Bugs From 801dfec75d3b232b46a41904a9bf92106662df0d Mon Sep 17 00:00:00 2001 From: Weslley Capelari Date: Wed, 13 May 2026 10:31:26 -0300 Subject: [PATCH 2/3] =?UTF-8?q?chore:=20N5=20sprint=20setup=20=E2=80=94=20?= =?UTF-8?q?locale=20sync=20&=20L4=20stabilization?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...oadmap-2026-05-26-sprint-n5-locale-sync.md | 212 ++++++++++++++++++ ROADMAP.md | 37 +-- library/github-baseline/MEMORY.md | 10 + 3 files changed, 243 insertions(+), 16 deletions(-) create mode 100644 .github/MISSIONS/roadmap-2026-05-26-sprint-n5-locale-sync.md diff --git a/.github/MISSIONS/roadmap-2026-05-26-sprint-n5-locale-sync.md b/.github/MISSIONS/roadmap-2026-05-26-sprint-n5-locale-sync.md new file mode 100644 index 0000000..432a713 --- /dev/null +++ b/.github/MISSIONS/roadmap-2026-05-26-sprint-n5-locale-sync.md @@ -0,0 +1,212 @@ +# Mission File — Sprint N5: Locale Sync & L4 Stabilization + +**Mission ID:** `MISSION-N5-2026-05-26` +**Sprint:** N5 — Locale Sync & L4 Stabilization +**Duration:** 2026-05-26 to 2026-06-09 (2 weeks) +**Status:** Approved & Active + +--- + +## Mission Objective + +Stabilize **Level 4 (Self-Auditable)** by eliminating locale drift between English and Portuguese documentation, operationalizing sync automation in CI, and updating governance docs to reflect current maturity status. + +**Why This Matters:** + +- Post-N4 audit identified PT-BR desincronizado (typos in Spanish, stale L4 references) +- Sync automation (`sync-locales.prompt.md`) exists but was never operationalized +- Docs claim "L4 is Next Goal" even though it's now "In Progress" +- Without automated sync validation, drift will accumulate with every PR + +**Outcome:** Repositories and teams can trust documentation as source of truth; locale sync is automated and validated. + +--- + +## Parent Agent & Delegation + +- **Parent Agent:** `@roadmap-steward` +- **Child Agents:** + - `@documentation-steward` — Tasks 1 & 2 + - `@project-auditor` — Task 3 + +--- + +## Success Criteria (Definition of Done) + +- ✅ PT-BR documentation is fully in sync with EN (0 content divergences) +- ✅ All locale-specific issues corrected ("Ejecute" → "Execute", L4 status updated, etc.) +- ✅ Governance docs (governance-maturity-model.md, zero-inertia-command.md) reflect L4 as "In Progress" +- ✅ Locale sync validation step deployed in CI (marketplace-integrity.yml or new workflow) +- ✅ CI blocks PRs that lack valid locale sync declaration +- ✅ Manual testing confirms sync step works as specified +- ✅ ROADMAP.md updated: N5 marked "✅ Completed", next sprint (N6) outlined +- ✅ MEMORY.md receives entry: MEM-20260513-01 documenting N5 approval & completion + +--- + +## Context Links + +- **Audit Report:** Generated 2026-05-13 post-N4 completion +- **Gap Analysis:** 2 HIGH gaps (PT-BR drift, sync automation missing) + 2 MEDIUM gaps +- **Maturity Status:** L4 threshold met (100% eval coverage, CI audit operational); entering stabilization phase +- **Previous Mission:** [N4 Hardening](roadmap-2026-05-12-sprint-n4-hardening.md) + +--- + +## Tasks Breakdown + +### Task 1: Execute sync-locales & Validate PT-BR Coherence + +**Owner:** `@documentation-steward` +**Acceptance Gate:** Before Task 2 starts + +**What to do:** + +1. Run comprehensive audit of `docs/en/` vs `docs/pt-br/` using sync-locales.prompt.md +2. Identify and fix all divergences (content, structure, locale-specific issues) +3. Correct known issues: + - "Ejecute" (Spanish) → "Execute" (Portuguese) in zero-inertia-command.md + - L4 status references update: "🎯 Next Goal" → "🚧 In Progress" +4. Validate structure mirrors: EN examples ↔ PT-BR examples + +**Output Contract:** + +- List of all divergences found (with locations) +- Corrected file blocks (ready to paste) +- Validation table (File | Status: Aligned/Fixed) +- Decision log (each change + reasoning) + +**PASS Criteria:** + +- No Spanish words in PT-BR locale +- All L4 references show "In Progress" (not "Next Goal") +- EN and PT-BR structures identical +- All links valid +- Decision log complete + +--- + +### Task 2: Update Governance Docs to Reflect L4 Status + +**Owner:** `@documentation-steward` +**Prerequisite:** Task 1 complete +**Acceptance Gate:** Before Task 3 starts + +**What to do:** + +1. Update L4 status in `library/standards/governance-maturity-model.md`: + - Change "🎯 Next Goal" → "🚧 In Progress" + - Add note: "Eval coverage 100%, CI audit operational, entering stabilization phase" +2. Update `docs/en/zero-inertia-command.md` (and sync PT-BR): + - Mention L4 achievement in appropriate section + - Update any stale references to L4 timeline +3. Update `ROADMAP.md`: + - Add closure note for N4 sprint + - Clearly mark N5 as current sprint + - Outline N6 in backlog (as placeholder) + +**Output Contract:** + +- Updated content blocks for each file +- Before/after comparison +- Grep confirmation: "Next Goal" removed, "In Progress" inserted +- Timestamp: 2026-05-13 (decision date) +- Decision log + +**PASS Criteria:** + +- All 3 files updated +- L4 shows "In Progress", L5 stays "Vision" +- Timestamp + decision note in each file +- No broken Markdown syntax +- No new broken references + +--- + +### Task 3: Operationalize Locale Sync Validation in CI + +**Owner:** `@project-auditor` +**Prerequisite:** Task 2 complete +**Acceptance Gate:** Manual test confirms functionality + +**What to do:** + +1. Extend `.github/workflows/marketplace-integrity.yml` with new step: "Locale Sync Validation (PR)" + - Or create new workflow if appropriate +2. Step should validate: + - Files in `docs/en/` have PT-BR equivalents (and vice versa) + - No Spanish text in EN locale (sanity check) + - Files contain valid locale sync declaration block +3. Report divergences as PR annotations (specific paths) +4. Behavior: + - **PASS:** All checks clear → exit code 0 + - **FAIL:** Divergence or missing declaration → exit code 1 (blocks PR) + +**Output Contract:** + +- YAML step definition (ready to paste) +- Shell/Python script for checks (inline or as .github/scripts/) +- Example PASS/FAIL run output +- Test evidence (actual run results) + +**PASS Criteria:** + +- Step runs on every PR touching `docs/` or `library/examples/` +- Detects missing PT-BR equivalents +- Detects stale locale declarations +- Blocks PR on HIGH divergences +- Manual test run confirms works correctly +- Decision log complete + +--- + +## Risk Mitigation + +| Risk | Severity | Mitigation | +| --------------------------------------------------- | -------- | --------------------------------------------------------------------------------------------- | +| PT-BR sync uncovers more drift than expected | Medium | Work in parallel (Tasks 1 & 2); cap extra findings at 2h, defer rest to N6 | +| CI validation step has edge cases | Medium | Thorough testing on branch; fallback: disable on initial merge, re-enable after manual review | +| Locale declaration format becomes point of friction | Low | Standardize early (YAML comment block), document in contributing guidelines | + +--- + +## Delivery Target + +**Delivery Date:** 2026-06-09 +**Merge Criteria:** + +- All 3 tasks PASS +- ROADMAP.md updated (N5 "✅ Completed", N6 outlined) +- MEMORY.md entry created (MEM-20260513-01) +- PR approved by roadmap-steward + documentation-steward + +--- + +## Decision Log + +| Date | Agent | Decision | Context | +| ---------- | --------------- | -------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- | +| 2026-05-13 | roadmap-steward | **Sprint N5 approved** | Post-N4 audit identified 2 HIGH gaps (PT-BR sync, sync automation). N5 addresses both + updates docs. Master prompt provided. | +| 2026-05-13 | roadmap-steward | **Sync automation priority elevated** | Was in backlog; moved to current sprint due to HIGH impact on trust/docs quality. | +| 2026-05-13 | roadmap-steward | **L4 status updated to "In Progress"** | 100% eval coverage achieved, CI audit operational. Level 4 threshold met; now in maturation phase. | +| 2026-05-13 | roadmap-steward | **Entry gate for Swift clarified** | Defer until 2 stable sprints post-L4 (N5 + N6). Reduces ambiguity; explicit criteria: "zero regressions in L4 gates." | + +--- + +## Next Actions (After N5 Completion) + +1. **Sprint N6 Planning** (2026-06-09): + - Re-audit maturity post-N5 + - Evaluate Swift readiness (has 2-sprint stabilization gate been met?) + - Plan N6 scope (documentation automation, Swift expansion, or other) + +2. **Ongoing:** + - Maintain MEMORY.md entries (≥1 per sprint) + - Keep ROADMAP.md current + - Monitor L4 gates for regressions + +--- + +**Mission File Version:** 1.0.0 +**Last Updated:** 2026-05-13 +**Status:** Active & Approved diff --git a/ROADMAP.md b/ROADMAP.md index e26200a..e0f45cc 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -2,34 +2,39 @@ > This file is maintained by the `@roadmap-steward` agent. Do not edit manually unless updating strategic direction. See [Manual Edit Policy](#manual-edit-policy) below. -## ✅ Completed Sprint (N4 Hardening — 2026-05-12 to 2026-05-26) +## 🗓️ Current Sprint (N5 — Locale Sync & L4 Stabilization — 2026-05-26 to 2026-06-09) -| Task | Priority | Status | Owner | Acceptance Criteria | -| ------------------------------------------ | -------- | ------ | ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Close Eval Coverage Gap for Core Agents | High | Done | `@asset-factory` | Cobertura de diretórios em `library/evals/` suba de 78.6% (11/14) para ≥90% (≥13/14). Agentes: `agent-factory`, `issue-ops-architect`, `rule-distiller`. | -| Operationalize Continuous Governance Audit | High | Done | `@project-auditor` | PR audit workflow gera resumo com gaps por severidade; falha em severidade Alta. Relatório em CI visível e acionável. | -| Activate Decision Memory Routine | Medium | Done | `@context-steward` | `library/github-baseline/MEMORY.md` recebe ≥2 entradas reais do sprint com template correto (ID, data, owner, impacto, evidência). | +| Task | Priority | Status | Owner | Acceptance Criteria | +| ----------------------------------------------- | -------- | ----------- | ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------- | +| Execute sync-locales & Validate PT-BR Coherence | High | Not Started | `@documentation-steward` | Nenhuma divergência EN ↔ PT-BR; "Ejecute" → "Execute"; referências de maturity atualizadas; sync declaration adicionada a docs. | +| Update Governance Docs to Reflect L4 Status | High | Not Started | `@documentation-steward` | governance-maturity-model.md, zero-inertia-command.md, ROADMAP.md mostram L4 como "In Progress"; timestamp e decision log adicionados. | +| Operationalize Locale Sync Validation in CI | Medium | Not Started | `@project-auditor` | Novo step no marketplace-integrity.yml ou workflow; valida locale sync; bloqueia PR sem declaração válida; teste manual confirmado. | -**Context**: This sprint closes the maturity gap between Level 3 (AI-Driven) and Level 4 (Self-Auditable). Success unblocks backlog and reduces rework. See [Mission File](.github/MISSIONS/roadmap-2026-05-12-sprint-n4-hardening.md) for details. +**Context**: Stabilizing Level 4 (Self-Auditable) by eliminating locale drift and automating sync validation. Post-N4 audit identified PT-BR gaps and automation missing. See [Mission File](.github/MISSIONS/roadmap-2026-05-26-sprint-n5-locale-sync.md) for details. --- -## 📝 Backlog (Future Sprints) +## ✅ Completed Work + +### Sprint N4 Hardening (2026-05-12 to 2026-05-26) -### Stack Expansion (Swift) — Deferred until Level 4 Stability +- **Close Eval Coverage Gap for Core Agents** (High, Done): Cobertura suba de 78.6% para 100%; 6 evals criadas (AGFAC-01/02, IOA-01/02, RD-01/02) + 3 HIGH findings corrigidos. +- **Operationalize Continuous Governance Audit** (High, Done): Step `Governance Audit — Eval Coverage Report (PR)` adicionada a marketplace-integrity.yml; relatório em CI visível e acionável. +- **Activate Decision Memory Routine** (Medium, Done): MEMORY.md recebe 3 entradas (MEM-20260512-01/02/03) documentando decisões do sprint. -- **Rationale**: Evitar dispersão antes da maturidade Self-Auditable. -- **Entry Gate**: Somente iniciar após 2 ciclos de sprint com gates de missão/auditoria estáveis e sem regressão crítica. +--- + +## 📝 Backlog (Future Sprints) -### Locale Sync Automation (en ↔ pt-br) +### Stack Expansion (Swift) — Deferred until Level 4 Stability (N6+) -- **Rationale**: Reforçar qualidade documental após estabilização dos gates de governança. -- **Entry Gate**: PR deve manter declaração de sync (`en_status`, `pt_br_status`, `pending_sync_tasks`) validada em CI. +- **Rationale**: Evitar dispersão antes da maturidade Self-Auditable consolidada. +- **Entry Gate**: Iniciar após 2 ciclos completos pós-L4 (N5 + N6) com zero regressões críticas em gates de missão/auditoria. -### Technical Documentation Automation +### Technical Documentation Automation (N6+) - **Rationale**: Reduzir esforço manual recorrente em onboarding e manutenção documental. -- **Entry Gate**: Executar somente após cobertura de evals ≥80% sustentada por 2 ciclos. +- **Entry Gate**: Executar após N5 completo e locale sync automation validada em 2 ciclos. --- diff --git a/library/github-baseline/MEMORY.md b/library/github-baseline/MEMORY.md index dee2687..0870d5a 100644 --- a/library/github-baseline/MEMORY.md +++ b/library/github-baseline/MEMORY.md @@ -55,6 +55,16 @@ Keep a lightweight, auditable memory of operational facts shared across human an +- **ID:** MEM-20260513-01 +- **Date:** 2026-05-13 +- **Owner:** roadmap-steward +- **Record:** Sprint N5 (Locale Sync & L4 Stabilization) approved. Post-N4 audit identified PT-BR desincronizado (stale L4 references, typos em espanhol). Sync automation never operationalized. N5 scope: 3 tasks — sync PT-BR, update docs, operationalize CI validation. Target: 2026-06-09. Mission file created. +- **Impact:** Documentação será confiável e em sync; locale drift automation evita acúmulo futuro; L4 status refletido corretamente. +- **Evidence/Link:** `.github/MISSIONS/roadmap-2026-05-26-sprint-n5-locale-sync.md` ; audit report inline em conversation +- **Status:** active + +--- + - **ID:** MEM-20260512-03 - **Date:** 2026-05-12 - **Owner:** asset-factory (sprint N4 Hardening) From 07e292501fa050533045a05c6525c67391ed7742 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Wed, 13 May 2026 17:00:10 +0000 Subject: [PATCH 3/3] fix: resolve PR review feedback for mission/evals/workflow consistency Agent-Logs-Url: https://github.com/weslleycapelari/github-patterns/sessions/577201a0-68d1-468c-b38c-e3f9468cf6e8 Co-authored-by: weslleycapelari <28955078+weslleycapelari@users.noreply.github.com> --- .../roadmap-2026-05-12-sprint-n4-hardening.md | 2 +- ...oadmap-2026-05-26-sprint-n5-locale-sync.md | 22 ++++++++++++++++++- .github/workflows/marketplace-integrity.yml | 4 +++- ROADMAP.md | 2 +- library/evals/README.md | 3 ++- .../case-02-expected-high-quality-output.md | 7 +++--- .../case-02-expected-high-quality-output.md | 14 ++++++------ library/github-baseline/MEMORY.md | 2 +- 8 files changed, 40 insertions(+), 16 deletions(-) diff --git a/.github/MISSIONS/roadmap-2026-05-12-sprint-n4-hardening.md b/.github/MISSIONS/roadmap-2026-05-12-sprint-n4-hardening.md index 54b0350..16a80a1 100644 --- a/.github/MISSIONS/roadmap-2026-05-12-sprint-n4-hardening.md +++ b/.github/MISSIONS/roadmap-2026-05-12-sprint-n4-hardening.md @@ -45,7 +45,7 @@ Result_Payload: | Sobrecarga operacional em time small | Média | Critérios binários em CI; não exigir perfeccionismo em evals de primeira versão. | | Cobertura de evals pode ser genérica inicialmente | Média | Review gate obrigatório; iteração rápida em ciclo seguinte com feedback real. | | MEMORY.md requer disciplina contínua | Baixa | Atribuir ownership ao context-steward; review semanal enxuta (15 min max). | -| Auditoria pode gerar falsos positivos | Média | Usar regras simples na primeira versão; refinar baseado em experience. | +| Auditoria pode gerar falsos positivos | Média | Usar regras simples na primeira versão; refinar baseado em experiência. | --- diff --git a/.github/MISSIONS/roadmap-2026-05-26-sprint-n5-locale-sync.md b/.github/MISSIONS/roadmap-2026-05-26-sprint-n5-locale-sync.md index 432a713..ec27765 100644 --- a/.github/MISSIONS/roadmap-2026-05-26-sprint-n5-locale-sync.md +++ b/.github/MISSIONS/roadmap-2026-05-26-sprint-n5-locale-sync.md @@ -5,6 +5,26 @@ **Duration:** 2026-05-26 to 2026-06-09 (2 weeks) **Status:** Approved & Active +Parent_Agent: roadmap-steward +Child_Agent: documentation-steward +Mission_Objective: Stabilize Level 4 (Self-Auditable) by eliminating locale drift between English and Portuguese documentation, operationalizing sync automation in CI, and updating governance docs to reflect current maturity status. +Context_Links: +- ROADMAP.md +- .github/MISSIONS/roadmap-2026-05-12-sprint-n4-hardening.md +- .github/workflows/marketplace-integrity.yml +- library/github-baseline/MEMORY.md +Success_Criteria: +- PT-BR documentation in sync with EN (0 content divergences) +- Locale-specific issues corrected +- Governance docs reflect L4 as "In Progress" +- Locale sync validation in CI +- PR declaration requirement for locale sync enforced +Result_Payload: +- List of divergences found and fixed +- Updated governance documentation blocks +- CI locale sync validation evidence +- Decision log with traceable rationale + --- ## Mission Objective @@ -13,7 +33,7 @@ Stabilize **Level 4 (Self-Auditable)** by eliminating locale drift between Engli **Why This Matters:** -- Post-N4 audit identified PT-BR desincronizado (typos in Spanish, stale L4 references) +- Post-N4 audit identified documentação PT-BR dessincronizada (typos in Spanish, stale L4 references) - Sync automation (`sync-locales.prompt.md`) exists but was never operationalized - Docs claim "L4 is Next Goal" even though it's now "In Progress" - Without automated sync validation, drift will accumulate with every PR diff --git a/.github/workflows/marketplace-integrity.yml b/.github/workflows/marketplace-integrity.yml index cf800ef..97dafe5 100644 --- a/.github/workflows/marketplace-integrity.yml +++ b/.github/workflows/marketplace-integrity.yml @@ -146,7 +146,9 @@ jobs: cases = [ f for f in os.listdir(agent_dir) - if f.endswith(".md") and f != ".gitkeep" + if os.path.isfile(os.path.join(agent_dir, f)) + and f.startswith("case-") + and f != ".gitkeep" and not f.startswith("README") ] count = len(cases) diff --git a/ROADMAP.md b/ROADMAP.md index e0f45cc..af66c9f 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -18,7 +18,7 @@ ### Sprint N4 Hardening (2026-05-12 to 2026-05-26) -- **Close Eval Coverage Gap for Core Agents** (High, Done): Cobertura suba de 78.6% para 100%; 6 evals criadas (AGFAC-01/02, IOA-01/02, RD-01/02) + 3 HIGH findings corrigidos. +- **Close Eval Coverage Gap for Core Agents** (High, Done): Cobertura subiu de 78.6% para 100%; 6 evals criadas (AGFAC-01/02, IOA-01/02, RD-01/02) + 3 HIGH findings corrigidos. - **Operationalize Continuous Governance Audit** (High, Done): Step `Governance Audit — Eval Coverage Report (PR)` adicionada a marketplace-integrity.yml; relatório em CI visível e acionável. - **Activate Decision Memory Routine** (Medium, Done): MEMORY.md recebe 3 entradas (MEM-20260512-01/02/03) documentando decisões do sprint. diff --git a/library/evals/README.md b/library/evals/README.md index 6e3048d..5cfaabc 100644 --- a/library/evals/README.md +++ b/library/evals/README.md @@ -51,7 +51,8 @@ library/evals/ | registry-schema-governor | 2 | 0 | 2 | draft | | asset-factory | 2 | 0 | 2 | draft | | repo-architect | 2 | 0 | 2 | draft | -| issue-ops-architect | 0 | 0 | 0 | pending | +| issue-ops-architect | 2 | 0 | 2 | draft | +| rule-distiller | 2 | 0 | 2 | draft | | asset-review-board | 2 | 0 | 2 | draft | | documentation-steward | 2 | 0 | 2 | draft | | prompt-studio | 2 | 0 | 2 | draft | diff --git a/library/evals/issue-ops-architect/case-02-expected-high-quality-output.md b/library/evals/issue-ops-architect/case-02-expected-high-quality-output.md index 5a2ec55..04efcf1 100644 --- a/library/evals/issue-ops-architect/case-02-expected-high-quality-output.md +++ b/library/evals/issue-ops-architect/case-02-expected-high-quality-output.md @@ -10,7 +10,7 @@ ## Scenario -The requester provides all mandatory intake fields. The agent must diagnose the team's workflow maturity, select the correct strategy, and produce a ready-to-run setup script that resolves all paths via `registry.json`. +The requester provides all mandatory intake fields and explicitly confirms the strategy choice. The agent must diagnose the team's workflow maturity and produce a ready-to-run setup script that resolves all paths via `registry.json`. ## Input (prompt payload) @@ -22,11 +22,12 @@ Preferred OS for script: Windows PowerShell Enable blank issues: no Please provision the issue template strategy and generate the setup script. +Confirmed strategy choice: agile-scrum ``` ## Required Output Contract -1. Strategy selected with explicit justification tied to intake answers. +1. Confirmed strategy (`agile-scrum`) acknowledged with explicit justification tied to intake answers. 2. Maturity diagnosis (brief summary of why this strategy fits). 3. A complete, runnable PowerShell script that: - Creates `.github/ISSUE_TEMPLATE/` directory. @@ -38,7 +39,7 @@ Please provision the issue template strategy and generate the setup script. ## PASS Criteria (all required) -1. Strategy selected is `agile-scrum` (consistent with Scrum + speed objective). +1. Uses the explicitly confirmed strategy `agile-scrum` (consistent with Scrum + speed objective). 2. Justification explicitly references at least 2 intake answers. 3. Script is syntactically valid PowerShell. 4. Script creates `.github/ISSUE_TEMPLATE/` directory. diff --git a/library/evals/rule-distiller/case-02-expected-high-quality-output.md b/library/evals/rule-distiller/case-02-expected-high-quality-output.md index 1e13219..f2893de 100644 --- a/library/evals/rule-distiller/case-02-expected-high-quality-output.md +++ b/library/evals/rule-distiller/case-02-expected-high-quality-output.md @@ -10,7 +10,7 @@ ## Scenario -The requester provides all mandatory intake fields, including a traceable source and explicit CONFIRM gate. The agent must produce a structured instruction file with separated BC and NBP sections, operational checklist, and full source traceability. +The requester provides all mandatory intake fields, including a traceable source and explicit CONFIRM gate. The agent must produce a structured preview with separated BC and NBP sections, operational checklist, and full source traceability, then write the file only after a separate explicit `CONFIRM` message. ## Input (prompt payload) @@ -22,25 +22,25 @@ Output target: new file at library/instructions/laravel-11.md Strictness level: strict Migration checklist: yes Output language preference: English -May I create/modify files after preview: yes +May I create/modify files after preview: only after explicit CONFIRM Please distill the release notes into Copilot instructions. ``` ## Required Output Contract -1. A complete instruction file at `library/instructions/laravel-11.md` containing: +1. After an explicit `CONFIRM`, a complete instruction file at `library/instructions/laravel-11.md` containing: - **Purpose** section (one paragraph). - **Breaking Changes** section with at least 2 entries, each containing: `BC-XX` ID, Impact, Required Action (imperative verb), Verification step. - **New Best Practices** section with at least 2 entries, each containing: `NBP-XX` ID, Benefit, Adoption guidance (imperative verb), Verification step. - **Operational Checklist** (migration checklist, since requested). - **Source Traceability** section with exact URL(s) and retrieval date. -2. Preview shown before file creation (since CONFIRM was given). +2. Preview shown first, and file creation blocked until a separate explicit `CONFIRM` message is received. 3. Decision log with actor, decision, and ISO-8601 timestamp. ## PASS Criteria (all required) -1. File is created at the exact path `library/instructions/laravel-11.md`. +1. File is created at the exact path `library/instructions/laravel-11.md` only after explicit `CONFIRM`. 2. Breaking Changes and New Best Practices are in separate sections (not mixed). When output language preference is English, all section titles must be in English — section must be titled "New Best Practices" (not "Novas Boas Práticas"). 3. Every BC-XX entry has: ID, Impact, Required Action with imperative verb, Verification. 4. Every NBP-XX entry has: ID, Benefit, Adoption guidance with imperative verb, Verification. @@ -48,7 +48,7 @@ Please distill the release notes into Copilot instructions. 6. Source Traceability includes the exact URL provided and a retrieval date. 7. No rule entry is generated from LLM general knowledge (all rules traceable to source). 8. Decision log has at least 1 entry with actor, decision, and ISO-8601 timestamp. -9. Preview is shown before file write is executed. +9. Preview is shown first and file write is executed only after a separate explicit `CONFIRM`. ## FAIL Conditions @@ -59,7 +59,7 @@ Please distill the release notes into Copilot instructions. - Operational Checklist is missing when migration checklist was requested. - File written to a path other than `library/instructions/laravel-11.md`. - No decision log. -- File created without showing a preview first. +- File created before preview, or created without a separate explicit `CONFIRM` after preview. ## Gate Decision diff --git a/library/github-baseline/MEMORY.md b/library/github-baseline/MEMORY.md index 0870d5a..04dedef 100644 --- a/library/github-baseline/MEMORY.md +++ b/library/github-baseline/MEMORY.md @@ -58,7 +58,7 @@ Keep a lightweight, auditable memory of operational facts shared across human an - **ID:** MEM-20260513-01 - **Date:** 2026-05-13 - **Owner:** roadmap-steward -- **Record:** Sprint N5 (Locale Sync & L4 Stabilization) approved. Post-N4 audit identified PT-BR desincronizado (stale L4 references, typos em espanhol). Sync automation never operationalized. N5 scope: 3 tasks — sync PT-BR, update docs, operationalize CI validation. Target: 2026-06-09. Mission file created. +- **Record:** Sprint N5 (Locale Sync & L4 Stabilization) approved. Post-N4 audit identified documentação PT-BR dessincronizada (stale L4 references, typos em espanhol). Sync automation never operationalized. N5 scope: 3 tasks — sync PT-BR, update docs, operationalize CI validation. Target: 2026-06-09. Mission file created. - **Impact:** Documentação será confiável e em sync; locale drift automation evita acúmulo futuro; L4 status refletido corretamente. - **Evidence/Link:** `.github/MISSIONS/roadmap-2026-05-26-sprint-n5-locale-sync.md` ; audit report inline em conversation - **Status:** active