From 5286b39deb02aa81cc5812813c647a0c8dc94aa0 Mon Sep 17 00:00:00 2001 From: Oleg Shulyakov Date: Sun, 24 May 2026 14:19:12 +0300 Subject: [PATCH] feat(skills): add adapt skill for evidence-driven change detection - Add `.agents/skills/adapt/SKILL.md` with workflow, boundaries, and verification - Add `.agents/skills/adapt/evals/evals.json` with 9 eval prompts - Register `adapt` in `.agents/skills/README.md` skills table - Refine `metadata.references` guidance in `create-skill` authoring docs - Update PRD, SPEC, and user stories to include `adapt` as the 10th general skill - Add design decisions (name, routing-only behavior) to PRD/SPEC decision logs --- .agents/skills/README.md | 1 + .agents/skills/adapt/SKILL.md | 87 +++++++++++++++++++ .agents/skills/adapt/evals/evals.json | 67 ++++++++++++++ .agents/skills/create-skill/SKILL.md | 2 +- .../create-skill/references/authoring.md | 4 +- docs/2026-05-20-general-agent-skills/PRD.md | 26 ++++-- docs/2026-05-20-general-agent-skills/SPEC.md | 44 +++++++--- ...US-001-author-standalone-general-skills.md | 9 +- .../US-002-generate-skill-evals.md | 5 +- 9 files changed, 216 insertions(+), 29 deletions(-) create mode 100644 .agents/skills/adapt/SKILL.md create mode 100644 .agents/skills/adapt/evals/evals.json diff --git a/.agents/skills/README.md b/.agents/skills/README.md index 87f0e7f..d3ec227 100644 --- a/.agents/skills/README.md +++ b/.agents/skills/README.md @@ -8,6 +8,7 @@ A complete skill is a directory with a required `SKILL.md` file and optional bun | Skill | Use it for | Notable resources | | --- | --- | --- | +| [`adapt`](adapt/SKILL.md) | Detecting evidence-driven change needs and routing updates to the right skill, workflow, artifact, or owner. | [`evals/`](adapt/evals/) | | [`ask-questions`](ask-questions/SKILL.md) | Generating high-leverage questions, clarifying missing context, and surfacing assumptions. | [`evals/`](ask-questions/evals/) | | [`audit-skill-security`](audit-skill-security/SKILL.md) | Auditing third-party or local skills before installing, updating, or trusting them. | [`references/audit-protocol.md`](audit-skill-security/references/audit-protocol.md) | | [`classify-content`](classify-content/SKILL.md) | Organizing material into meaningful groups by criteria, similarity, priority, dependency, or abstraction level. | [`evals/`](classify-content/evals/) | diff --git a/.agents/skills/adapt/SKILL.md b/.agents/skills/adapt/SKILL.md new file mode 100644 index 0000000..a844194 --- /dev/null +++ b/.agents/skills/adapt/SKILL.md @@ -0,0 +1,87 @@ +--- +name: adapt +description: Detect evidence-driven change needs. Use when the user says "adapt based on this", "what should change after this?", "this keeps happening", "this failed, what should change?", "the workflow no longer fits", "the constraints changed", or asks what skill, rule, doc, eval, memory, or process should change. +license: MIT +tags: + - adaptation + - feedback + - process +metadata: + author: Oleg Shulyakov + version: "1.0.0" + source: github.com/olegshulyakov/agent.md + catalog: utility + category: productivity + references: + - create-skill + - create-rule + - remember-context + - write-prd + - write-spec + - write-tech-docs + - write-tests + - write-user-story +--- + +# adapt + +## Workflow + +**Detect what should change going forward because reality contradicted the current setup.** + +1. Identify the adaptation signal: observed outcome, user feedback, failure, repeated friction, outdated assumption, or changed constraint. +2. Decide whether the signal is durable enough to justify adaptation, or only a one-off exception. +3. Identify the affected behavior or artifact: skill, rule, workflow, document, eval, memory convention, test, or process. +4. Check whether the symptom comes from a governing convention, template, or source-of-truth artifact; route the change there instead of patching only the local artifact. +5. State the smallest useful change that would prevent recurrence or fit the new constraint. +6. Route the actual update to the appropriate follow-up skill, workflow, or owner. +7. Define how the adaptation should be verified. + +--- + +## Output + +**Make the diagnosis actionable without doing the update by default.** + +- **Signal:** State what happened or changed. +- **Interpretation:** Explain why it indicates a future behavior or artifact may need to change. +- **Target:** Name the affected artifact, behavior, or process when identifiable. +- **Change:** Describe the smallest useful adaptation. +- **Route:** Identify the appropriate follow-up skill, workflow, or owner for the actual update. +- **Verification:** State what should be true after the adaptation. + +--- + +## Boundaries + +**Adaptation is diagnosis and routing, not a generic rewrite workflow.** + +- **Do not overfit:** Avoid changing durable behavior for a single ambiguous incident unless the user explicitly wants a one-off correction. +- **Do not rewrite by default:** Do not edit skills, rules, docs, evals, tests, or memory unless the user separately asks to proceed with that update. +- **Use evidence:** Base adaptation on observed outcomes, feedback, failures, repeated friction, outdated assumptions, or changed constraints. +- **Route precisely:** Skills belong with skill-authoring workflows, rules with rule-authoring workflows, docs with writing workflows, tests with testing workflows, and durable facts with memory workflows. +- **Prefer source of truth:** When the mismatch comes from a convention, template, or authoring guidance, adapt that governing artifact instead of only patching the artifact that exposed the problem. +- **Keep uncertainty visible:** If the target artifact or change is unclear, name the likely candidates and the evidence needed to choose. + +--- + +## Non-Triggers + +**Nearby requests often need another mode unless they ask what should change going forward.** + +- **Direct editing:** "Update this doc" or "rewrite this skill" should perform the requested artifact update, not stop at adaptation diagnosis. +- **Memory only:** "Remember this decision" should preserve durable context rather than diagnose process change. +- **Root-cause only:** "Why did this fail?" should explain or investigate unless the user asks what should change afterward. +- **Decision only:** "Which option should we choose?" should compare options and recommend a direction. +- **Planning only:** "Break this down" should produce a plan when the desired change is already known. + +--- + +## Verification + +**A good adaptation recommendation changes future behavior without broadening the system unnecessarily.** + +- **Evidence test:** The recommendation ties back to a concrete signal, not a vague desire to improve. +- **Durability test:** The change handles a repeated or likely future condition, not only the exact current wording. +- **Scope test:** The proposed change is the smallest artifact or behavior update that addresses the signal. +- **Routing test:** The actual update path is clear and does not create a hidden runtime dependency on another skill. diff --git a/.agents/skills/adapt/evals/evals.json b/.agents/skills/adapt/evals/evals.json new file mode 100644 index 0000000..9c1c206 --- /dev/null +++ b/.agents/skills/adapt/evals/evals.json @@ -0,0 +1,67 @@ +{ + "evals": [ + { + "id": "adapt-001", + "category": "true-positive", + "prompt": "This failed, what should change so we don't repeat it?", + "expected_trigger": "Trigger adapt.", + "expected_output": "Identifies the failure signal, decides whether durable adaptation is warranted, names the affected behavior or artifact, routes the actual update, and defines verification." + }, + { + "id": "adapt-002", + "category": "true-positive", + "prompt": "Our instructions didn't handle this edge case. What artifact should change because of this?", + "expected_trigger": "Trigger adapt.", + "expected_output": "Maps the observed gap to likely instruction, rule, skill, eval, or doc targets and recommends the smallest follow-up update path." + }, + { + "id": "adapt-003", + "category": "true-positive", + "prompt": "We keep hitting this issue during reviews. Do we need to update a skill, rule, doc, eval, or memory?", + "expected_trigger": "Trigger adapt.", + "expected_output": "Treats the repeated friction as an adaptation signal, compares likely target artifacts, and routes the actual change without rewriting by default." + }, + { + "id": "adapt-004", + "category": "true-positive", + "prompt": "The constraints changed, and the workflow no longer fits. Adapt based on this.", + "expected_trigger": "Trigger adapt.", + "expected_output": "Identifies the changed constraint, explains the mismatch, proposes the smallest workflow or artifact adaptation, and states verification criteria." + }, + { + "id": "adapt-005", + "category": "true-positive", + "prompt": "You patched this skill, but the real issue was the authoring guidance. What should change?", + "expected_trigger": "Trigger adapt.", + "expected_output": "Recognizes that the durable change belongs in the governing convention or authoring source of truth, not only in the local skill that exposed the mismatch." + }, + { + "id": "adapt-006", + "category": "false-positive", + "prompt": "Improve this skill's examples and validation checklist.", + "expected_trigger": "Do not trigger adapt as the primary skill.", + "expected_output": "Should use the skill-authoring workflow directly because the user already requested a concrete skill update." + }, + { + "id": "adapt-007", + "category": "false-positive", + "prompt": "Remember that we decided to keep adapt as a routing skill.", + "expected_trigger": "Do not trigger adapt.", + "expected_output": "Should preserve durable memory if appropriate rather than diagnose what should change." + }, + { + "id": "adapt-008", + "category": "non-trigger", + "prompt": "Why did the deployment fail yesterday?", + "expected_trigger": "Do not trigger adapt.", + "expected_output": "Should investigate or explain the failure unless the user asks what should change going forward." + }, + { + "id": "adapt-009", + "category": "non-trigger", + "prompt": "Break the migration into milestones with risks and validation steps.", + "expected_trigger": "Do not trigger adapt.", + "expected_output": "Should produce a work plan because the user asked for sequencing, not evidence-driven adaptation." + } + ] +} diff --git a/.agents/skills/create-skill/SKILL.md b/.agents/skills/create-skill/SKILL.md index f38babb..107f273 100644 --- a/.agents/skills/create-skill/SKILL.md +++ b/.agents/skills/create-skill/SKILL.md @@ -60,7 +60,7 @@ If the request spans multiple phases, read the references in workflow order: aut - **Scan anchors**: use bold labels for distinct rule bullets in prose skill docs unless the section is a schema, command example, or literal output template. - **Size discipline**: keep metadata under 100 tokens and the main instruction body under 500 lines; use references for anything that would push past that. - **Metadata fields**: use only `name`, `description`, `license`, `tags`, and `metadata` at the top level; put `author`, `version`, `source`, `catalog`, `category`, and `references` under `metadata`. -- **Reference metadata**: use `metadata.references` only for local skills or rules the skill uses as part of its workflow; do not list route-away, adjacent-skill, near-miss, or boundary mentions. +- **Reference metadata**: use `metadata.references` for local skills or rules the skill uses as part of its workflow, including router skills that name follow-up skills as intended routes. Do not list adjacent-skill, near-miss, boundary, or example-only mentions. - **Pushy descriptions**: explicitly name the user phrases and contexts that should trigger the skill, not just what it does. Claude tends to undertrigger, so err toward specificity. - **Trigger placement**: put all "when to use" and skill-call scope information in the frontmatter `description`; do not add a body `Scope` section for activation criteria. Put routing, exclusions, boundaries, examples, and detailed procedures in the body or references. - **No placeholders**: add `scripts/`, `references/`, `assets/`, or `evals/` only when the skill actually uses them. diff --git a/.agents/skills/create-skill/references/authoring.md b/.agents/skills/create-skill/references/authoring.md index 1fac09e..591a37f 100644 --- a/.agents/skills/create-skill/references/authoring.md +++ b/.agents/skills/create-skill/references/authoring.md @@ -43,9 +43,9 @@ Common nested metadata fields: | `metadata.source` | Repository or canonical source reference, such as `github.com/org/repo`. | | `metadata.catalog` | Optional catalog grouping string. | | `metadata.category` | Optional domain category string in lowercase kebab-case, such as `development`, `documentation`, or `project-management`. | -| `metadata.references` | Optional list of local skill or rule names this skill explicitly uses. | +| `metadata.references` | Optional list of local skill or rule names this skill explicitly uses or routes to. | -Use `metadata.references` only when this skill actually uses another local skill or rule as part of its workflow. Include a referenced item when the body tells the agent to use, apply, delegate to, or run that skill/rule before or during this skill's work. Do not include skills that appear only as route-away guidance, adjacent alternatives, near misses, exclusions, or examples of work this skill should not handle. +Use `metadata.references` when this skill actually uses another local skill or rule as part of its workflow. Include a referenced item when the body tells the agent to use, apply, delegate to, run, or route follow-up work to that skill/rule. Do not include skills that appear only as adjacent alternatives, near misses, exclusions, boundaries, or examples of work this skill should not handle. Keep the Markdown body under 500 lines. The body should explain workflow, routing decisions, boundaries, critical rules, and output format. Move deep detail into `references/` and point to it clearly. Do not use a body `Scope` section to describe when the skill should be called; that belongs in `description` per the Agent Skills spec. diff --git a/docs/2026-05-20-general-agent-skills/PRD.md b/docs/2026-05-20-general-agent-skills/PRD.md index f40b656..87f5d9e 100644 --- a/docs/2026-05-20-general-agent-skills/PRD.md +++ b/docs/2026-05-20-general-agent-skills/PRD.md @@ -3,7 +3,7 @@ status: APPROVED documentType: PRD phase: delivery createdAt: "2026-05-20" -updatedAt: "2026-05-21" +updatedAt: "2026-05-24" author: Oleg Shulyakov owner: Oleg Shulyakov stakeholders: Users of this agent.md skill library @@ -18,9 +18,9 @@ related: ## Objective -Create a small set of general-purpose agent skills that cover recurring thinking modes across all projects: asking, explaining, reasoning, classifying, planning, exploring local context, deciding, coordinating, and remembering. Each skill must be useful as a standalone installable unit at runtime, without assuming any other skill is present. +Create a small set of general-purpose agent skills that cover recurring thinking modes across all projects: asking, explaining, reasoning, classifying, planning, exploring local context, deciding, coordinating, remembering, and adapting. Each skill must be useful as a standalone installable unit at runtime, without assuming any other skill is present. -The remaining gap is a set of project-agnostic skills for common collaboration modes: when the user wants to ask better questions, understand something, reason through an ambiguous problem, organize messy material, frame work, inspect local project context, choose a direction, coordinate execution, or persist useful project memory. +The remaining gap is a set of project-agnostic skills for common collaboration modes: when the user wants to ask better questions, understand something, reason through an ambiguous problem, organize messy material, frame work, inspect local project context, choose a direction, coordinate execution, persist useful project memory, or detect when existing behavior no longer fits observed evidence. Without these skills, the agent has to infer these broad behaviors from generic instructions each time. That increases trigger ambiguity and makes common collaboration modes less consistent. @@ -30,8 +30,8 @@ Without these skills, the agent has to infer these broad behaviors from generic | Goal ID | Target Outcome | Success Metric | | --- | --- | --- | -| G-1 | Provide a minimal general skill set for common agent collaboration modes. | Nine skills exist: `ask-questions`, `explain-topic`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, and `remember-context`. | -| G-2 | Keep every skill independently installable. | Each skill works at runtime without requiring, naming, or delegating to another skill. | +| G-1 | Provide a minimal general skill set for common agent collaboration modes. | Ten skills exist: `ask-questions`, `explain-topic`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, `remember-context`, and `adapt`. | +| G-2 | Keep every skill independently installable. | Each skill works at runtime without requiring, calling, importing, or delegating to another installed skill. | | G-3 | Make trigger behavior predictable. | Each skill has explicit trigger phrases, exclusions, and at least 7 representative eval prompts: 3 true-positive, 2 false-positive, and 2 non-trigger prompts. | | G-4 | Keep each skill lightweight and reusable across repositories. | Each `SKILL.md` stays under 500 lines and uses references only when needed. | @@ -58,6 +58,7 @@ Without these skills, the agent has to infer these broad behaviors from generic - `decide-direction`: Compare options and recommend a course of action with tradeoffs, assumptions, and decision criteria. - `coordinate-work`: Manage multi-step or multi-agent work by tracking goals, owners, dependencies, status, blockers, handoffs, and next actions. - `remember-context`: Capture durable project facts, decisions, and useful observations in `.agents/memory/`. +- `adapt`: Detect when existing behavior, skills, rules, workflows, docs, evals, or memory conventions no longer fit observed outcomes, user feedback, failures, repeated friction, outdated assumptions, or changed constraints, then identify what should change and which appropriate skill or workflow should make the change. - Trigger and exclusion guidance for each skill. - Acceptance criteria and eval prompts for skill behavior. - Standalone installation guidance for each skill. @@ -67,6 +68,7 @@ Without these skills, the agent has to infer these broad behaviors from generic - Live integrations with Jira, Linear, Confluence, GitHub Issues, or external memory stores. - Web search, web browsing, or external/current-information research inside `explore-context`. - Automatic memory writes without user intent or clearly durable project value. +- Direct artifact rewrites by `adapt` unless the user separately asks to proceed with the appropriate update workflow. - Replacing project instructions in `AGENTS.md`. ### Later @@ -89,8 +91,9 @@ Without these skills, the agent has to infer these broad behaviors from generic | FR-7 | Define `decide-direction` as the skill for choosing among options. | MUST | Triggers on “decide-direction”, “choose”, “which option”, “tradeoffs”, “recommend”, and “should we”. States decision criteria, compares viable options, recommends one, and identifies reversibility or risk. | TBD | | FR-8 | Define `coordinate-work` as the skill for managing active work across people, agents, tasks, and dependencies. | MUST | Triggers on “coordinate-work”, “manage this work”, “team lead”, “lead this”, “assign”, “delegate”, “track blockers”, “status”, “handoff”, and multi-agent or multi-workstream requests. Maintains an execution view with goals, owners, dependencies, current status, blockers, and next actions. | TBD | | FR-9 | Define `remember-context` as the skill for durable project memory. | MUST | Triggers when the user asks to remember, save context, record a decision, update memory, or preserve a project fact. When the user explicitly asks to remember something, the memory write is auto-approved and should proceed without asking again. Writes only durable facts, decisions, and observations to `.agents/memory/` according to project conventions. Avoids storing transient task chatter or unverifiable assumptions as fact. | TBD | -| FR-10 | Document standalone runtime boundaries. | MUST | Each skill defines its own purpose, trigger phrases, non-trigger cases, expected behavior, and output shape without requiring, naming, or delegating to another skill at runtime. | TBD | -| FR-11 | Add behavior evals. | SHOULD | Each skill has at least 7 representative prompts: 3 true-positive prompts, 2 false-positive prompts, and 2 non-trigger prompts. | TBD | +| FR-10 | Define `adapt` as the skill for detecting evidence-driven change needs and routing the update. | MUST | Triggers on “adapt based on this”, “what should change after this?”, “fold this feedback into our process”, “this keeps happening”, “we keep hitting this issue”, “this failed, what should change?”, “our instructions didn’t handle this”, “the workflow no longer fits”, “the constraints changed”, “this behavior is outdated”, “adjust future behavior based on this”, “turn this failure into an instruction change”, “what artifact should change because of this?”, and “do we need to update a skill, rule, doc, eval, or memory?”. Identifies the adaptation signal, affected behavior or artifact, smallest useful change, and appropriate follow-up skill or workflow. Does not directly update artifacts by default. | TBD | +| FR-11 | Document standalone runtime boundaries. | MUST | Each skill defines its own purpose, trigger phrases, non-trigger cases, expected behavior, and output shape without requiring, calling, importing, or delegating to another skill at runtime. `adapt` may identify an appropriate follow-up skill or workflow as a route, but that route is not a runtime dependency. | TBD | +| FR-12 | Add behavior evals. | SHOULD | Each skill has at least 7 representative prompts: 3 true-positive prompts, 2 false-positive prompts, and 2 non-trigger prompts. | TBD | --- @@ -107,6 +110,7 @@ Without these skills, the agent has to infer these broad behaviors from generic | NFR-7 | Source Discipline | `explore-context` must cite local files, project docs, or attached artifacts and distinguish verified repository facts from inference. It must not perform web search or browsing. | | NFR-8 | Memory Hygiene | `remember-context` must preserve useful context without duplicating docs or storing sensitive/transient information. | | NFR-9 | Coordination Clarity | `coordinate-work` must keep status, owners, blockers, and next actions explicit enough that another agent or human can continue the work. | +| NFR-10 | Adaptation Discipline | `adapt` must distinguish durable evidence-driven change signals from one-off exceptions and route actual updates to the appropriate skill or workflow. | --- @@ -115,7 +119,7 @@ Without these skills, the agent has to infer these broad behaviors from generic | Milestone | Target Date | Exit Criteria | Owner | | --- | --- | --- | --- | | M-1 | TBD | Draft PRD approved. | Oleg Shulyakov | -| M-2 | TBD | Skill descriptions and trigger boundaries drafted for all nine skills. | TBD | +| M-2 | TBD | Skill descriptions and trigger boundaries drafted for all ten skills. | TBD | | M-3 | TBD | `SKILL.md` files created or updated. | TBD | | M-4 | TBD | Eval prompts added and trigger overlap checked. | TBD | @@ -132,6 +136,7 @@ Without these skills, the agent has to infer these broad behaviors from generic 7. A user asks, “Should we build this as a plugin or a skill?” The agent uses `decide-direction`, compares options against explicit criteria, and recommends one. 8. A user asks, “Lead this migration across frontend, backend, and tests.” The agent uses `coordinate-work` to track workstreams, owners, dependencies, blockers, status, and handoffs. 9. A user asks, “Remember that we chose skills over plugins for this.” The agent uses `remember-context`, records the durable decision in the appropriate memory file, and keeps the note concise. +10. A user says, “This failed, what should change?” The agent uses `adapt` to identify the failure signal, determine whether a skill, rule, doc, eval, memory convention, or workflow should change, and route the actual update to the appropriate follow-up skill or workflow. --- @@ -148,6 +153,7 @@ Without these skills, the agent has to infer these broad behaviors from generic | R-7 | `reason-problem` could become vague brainstorming without useful output. | MEDIUM | Require a clear problem framing, assumptions, hypotheses or options, and suggested next clarity step. | OPEN | | R-8 | `ask-questions` could become an endless questionnaire. | MEDIUM | Require prioritized questions and a bias toward the smallest question set that changes the next action. | OPEN | | R-9 | `classify-content` could force false precision. | MEDIUM | Require explicit grouping criteria, ambiguous cases, and optional multi-label classifications when needed. | OPEN | +| R-10 | `adapt` could become a generic “improve anything” skill or rewrite artifacts directly. | MEDIUM | Define `adapt` as evidence-driven change detection and routing only; actual updates belong to the appropriate artifact-specific skill or workflow. | OPEN | --- @@ -156,7 +162,7 @@ Without these skills, the agent has to infer these broad behaviors from generic | Dependency ID | Item | Impacted Requirements | Validation Owner | | --- | --- | --- | --- | | D-1 | Existing `.agents/memory/` conventions | FR-9, NFR-8 | Oleg Shulyakov | -| D-2 | `create-skill` validation workflow for development-time checks only | FR-11 | TBD | +| D-2 | `create-skill` validation workflow for development-time checks only | FR-12 | TBD | --- @@ -172,5 +178,7 @@ Without these skills, the agent has to infer these broad behaviors from generic | Q-6 | Should the ambiguous-problem skill be named `reason-problem`, `think`, or `brainstorm`? | Decided: `reason-problem`, because it covers brainstorming, framing, assumptions, and argument-testing without being limited to idea generation. | Oleg Shulyakov | 2026-05-21 | | Q-7 | Should question generation be its own skill or part of `reason-problem`? | Decided: keep `ask-questions` separate because identifying the right questions is a distinct output and often useful before any reasoning path is chosen. | Oleg Shulyakov | 2026-05-21 | | Q-8 | Should grouping be named `classify-content`, `sort`, or `categorize`? | Decided: `classify-content`, because it covers category assignment, similarity/difference grouping, taxonomies, and edge cases more precisely than `sort`. | Oleg Shulyakov | 2026-05-21 | +| Q-9 | Should the evidence-driven change-detection skill be named `evolve` or `adapt`? | Decided: `adapt`, because it detects when existing behavior or artifacts no longer fit observed evidence without implying autonomous self-modification. | Oleg Shulyakov | 2026-05-24 | +| Q-10 | Should `adapt` perform the actual update to skills, rules, docs, evals, or memory? | Decided: no. `adapt` diagnoses the need and routes the update; artifact-specific skills such as `create-skill`, `create-rule`, `write-*`, `write-tests`, or `remember-context` perform the actual change when requested. | Oleg Shulyakov | 2026-05-24 | --- diff --git a/docs/2026-05-20-general-agent-skills/SPEC.md b/docs/2026-05-20-general-agent-skills/SPEC.md index 05b13d2..2a645de 100644 --- a/docs/2026-05-20-general-agent-skills/SPEC.md +++ b/docs/2026-05-20-general-agent-skills/SPEC.md @@ -4,7 +4,7 @@ documentType: SPEC phase: delivery version: 1.0 createdAt: "2026-05-20" -updatedAt: "2026-05-21" +updatedAt: "2026-05-24" author: Oleg Shulyakov tags: - skills @@ -19,7 +19,7 @@ related: ### 1.1 Purpose -This spec defines the implementation contract for nine standalone, general-purpose agent skills: `ask-questions`, `explain-topic`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, and `remember-context`. +This spec defines the implementation contract for ten standalone, general-purpose agent skills: `ask-questions`, `explain-topic`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, `remember-context`, and `adapt`. The goal is to make common collaboration modes predictable at runtime without requiring any skill to depend on another installed skill. @@ -42,16 +42,16 @@ This work creates a small general layer with clear trigger boundaries, exclusion ### 1.4 Customer & Business Context -The primary users are individual developers, maintainers, and project leads who want consistent collaboration behavior for asking, explaining, reasoning, classifying, planning, exploring local context, deciding, coordinating, and remembering. +The primary users are individual developers, maintainers, and project leads who want consistent collaboration behavior for asking, explaining, reasoning, classifying, planning, exploring local context, deciding, coordinating, remembering, and adapting to evidence that existing behavior no longer fits. -Success means a user can install any one of the nine skills independently and get useful behavior for that mode without hidden runtime coupling. +Success means a user can install any one of the ten skills independently and get useful behavior for that mode without hidden runtime coupling. ### 1.5 Goals | Goal | Success Metric | Target | | --- | --- | --- | -| Minimal general skill set | Nine skills exist with approved names | `ask-questions`, `explain-topic`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, `remember-context` | -| Standalone runtime behavior | No skill requires another skill to be installed, named, or delegated to | 100% of skills | +| Minimal general skill set | Ten skills exist with approved names | `ask-questions`, `explain-topic`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, `remember-context`, `adapt` | +| Standalone runtime behavior | No skill requires another skill to be installed, called, imported, or delegated to | 100% of skills | | Predictable triggers | Each skill documents triggers, exclusions, expected behavior, and eval prompts | 8-10 eval prompts where possible, never fewer than 7 | | Lightweight packaging | Main `SKILL.md` files remain concise | Under 500 lines each | @@ -226,7 +226,21 @@ Each skill body shall define purpose, scope, trigger cases, non-trigger cases, w - [ ] Avoids storing transient task chatter, sensitive information, or unverifiable assumptions as fact. - [ ] Follows existing `.agents/memory/MEMORY.md` and dated memory file conventions. -#### FR-010: Standalone Runtime Boundaries +#### FR-010: `adapt` + +**Priority:** Must-have +**Description:** The system shall use `adapt` to detect when existing behavior, skills, rules, workflows, docs, evals, or memory conventions no longer fit observed outcomes, user feedback, failures, repeated friction, outdated assumptions, or changed constraints. + +**Acceptance criteria:** + +- [ ] Triggers on "adapt based on this", "what should change after this?", "fold this feedback into our process", "this keeps happening", "we keep hitting this issue", "this failed, what should change?", "our instructions didn't handle this", "the workflow no longer fits", "the constraints changed", "this behavior is outdated", "adjust future behavior based on this", "turn this failure into an instruction change", "what artifact should change because of this?", and "do we need to update a skill, rule, doc, eval, or memory?". +- [ ] Identifies the adaptation signal: observed outcome, user feedback, failure, repeated friction, outdated assumption, or changed constraint. +- [ ] Distinguishes durable evidence-driven change needs from one-off exceptions. +- [ ] Identifies the affected behavior or artifact, such as a skill, rule, workflow, document, eval, memory convention, or process. +- [ ] Recommends the smallest useful change and the appropriate follow-up skill or workflow for the actual update. +- [ ] Does not directly rewrite artifacts by default; actual updates belong to artifact-specific skills such as `create-skill`, `create-rule`, `write-*`, `write-tests`, or `remember-context` when the user asks to proceed. + +#### FR-011: Standalone Runtime Boundaries **Priority:** Must-have **Description:** Each skill shall define complete runtime behavior without requiring another skill. @@ -234,10 +248,11 @@ Each skill body shall define purpose, scope, trigger cases, non-trigger cases, w **Acceptance criteria:** - [ ] No `SKILL.md` says the runtime must use, call, import, or delegate to another skill. +- [ ] `adapt` may identify an appropriate follow-up skill or workflow as a route, but that route is not a runtime dependency. - [ ] Any development-time references to authoring or validation workflows are clearly not runtime dependencies. - [ ] Shared concepts may be repeated where needed to preserve standalone behavior. -#### FR-011: Behavior Evals +#### FR-012: Behavior Evals **Priority:** Should-have **Description:** Each skill shall include representative eval prompts for trigger behavior. @@ -253,7 +268,7 @@ Each skill body shall define purpose, scope, trigger cases, non-trigger cases, w ### 2.5 Business Rules -**BR-001:** Skill names are fixed as `ask-questions`, `explain-topic`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, and `remember-context`. +**BR-001:** Skill names are fixed as `ask-questions`, `explain-topic`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, `remember-context`, and `adapt`. **BR-002:** Runtime behavior must be standalone. Development-time validation may use existing creator or packaging workflows, but installed skill behavior must not depend on them. @@ -261,7 +276,9 @@ Each skill body shall define purpose, scope, trigger cases, non-trigger cases, w **BR-004:** `remember-context` may write memory automatically only when the user explicitly asks to remember or preserve something. -**BR-005:** Durable task documentation belongs under `docs/`; durable memory facts and small implementation notes belong under `.agents/memory/`. +**BR-005:** `adapt` detects and routes evidence-driven change needs. It shall not directly rewrite skills, rules, docs, evals, or memory by default. + +**BR-006:** Durable task documentation belongs under `docs/`; durable memory facts and small implementation notes belong under `.agents/memory/`. --- @@ -276,6 +293,7 @@ Each skill body shall define purpose, scope, trigger cases, non-trigger cases, w | Source discipline | `explore-context` cites local evidence and marks inference | Findings include file/artifact references when available | High | | Memory hygiene | `remember-context` stores only durable value | No transient chatter or sensitive data in memory notes | High | | Coordination clarity | `coordinate-work` preserves execution state | Goals, owners, status, blockers, dependencies, and next actions are explicit | Medium | +| Adaptation discipline | `adapt` diagnoses evidence-driven change needs without becoming a generic update workflow | Actual artifact changes are routed to the appropriate follow-up skill or workflow | High | --- @@ -385,6 +403,8 @@ Boundary prompts shall specifically test likely overlaps: | `explain-topic` vs `explore-context` | `explain-topic` teaches; `explore-context` investigates local evidence | | `classify-content` vs `decide-direction` | `classify-content` groups material; `decide-direction` chooses a direction | | `remember-context` vs docs writing | `remember-context` captures durable memory; docs writing creates formal project artifacts | +| `adapt` vs `create-skill` / `create-rule` / docs writing | `adapt` identifies what should change and routes the update; artifact-specific skills perform the actual update | +| `adapt` vs `remember-context` | `adapt` identifies future behavior or artifact changes; `remember-context` preserves durable facts and decisions | ### 8.4 Manual Acceptance @@ -396,7 +416,7 @@ Manual acceptance passes when a reviewer can invoke representative prompts and o ### Phase 1: Skill Boundaries -- [ ] Draft `SKILL.md` for `ask-questions`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, and `remember-context`. +- [ ] Draft `SKILL.md` for `ask-questions`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, `remember-context`, and `adapt`. - [x] Treat existing `explain-topic` as complete for this work. - [ ] Confirm each skill has clear trigger and non-trigger rules. @@ -450,6 +470,8 @@ Manual acceptance passes when a reviewer can invoke representative prompts and o | 1 | Should eval prompts be plain Markdown or a machine-readable format? | Oleg Shulyakov | 2026-05-21 | Resolved: evals are generated by `.agents/skills/create-skill/`. | | 2 | Should `explain-topic` be treated as already complete or revised to match the new general skill set style? | Oleg Shulyakov | 2026-05-21 | Resolved: mark `explain-topic` as complete. | | 3 | Should every new skill use version `1.0.0`, or inherit a project-wide initial version convention? | Oleg Shulyakov | 2026-05-21 | Resolved: use `1.0.0` as the initial version. | +| 4 | Should the evidence-driven change-detection skill be named `evolve` or `adapt`? | Oleg Shulyakov | 2026-05-24 | Resolved: use `adapt`, because it detects that existing behavior no longer fits evidence without implying autonomous self-modification. | +| 5 | Should `adapt` perform the actual updates it identifies? | Oleg Shulyakov | 2026-05-24 | Resolved: no. It diagnoses and routes updates to artifact-specific skills or workflows. | --- diff --git a/docs/2026-05-20-general-agent-skills/user-stories/US-001-author-standalone-general-skills.md b/docs/2026-05-20-general-agent-skills/user-stories/US-001-author-standalone-general-skills.md index 0cda3a4..6ab69ea 100644 --- a/docs/2026-05-20-general-agent-skills/user-stories/US-001-author-standalone-general-skills.md +++ b/docs/2026-05-20-general-agent-skills/user-stories/US-001-author-standalone-general-skills.md @@ -14,14 +14,14 @@ Source documents: - **Persona:** As a skill library maintainer, - **Action:** I want the missing general-purpose agent skills authored as standalone installable skill folders, - **Outcome:** so that users can invoke consistent collaboration modes without hidden runtime dependencies. -- **Epic Context:** Implements the approved General Agent Skills PRD/SPEC by creating `ask-questions`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, and `remember-context`. Existing `explain-topic` is already complete and must not be rewritten unless validation reveals a spec violation. +- **Epic Context:** Implements the approved General Agent Skills PRD/SPEC by creating `ask-questions`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, `remember-context`, and `adapt`. Existing `explain-topic` is already complete and must not be rewritten unless validation reveals a spec violation. --- ## 🔍 2. Strict Constraints & Scope Boundaries - **In-Scope:** - - Create `.agents/skills//SKILL.md` for `ask-questions`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, and `remember-context`. + - Create `.agents/skills//SKILL.md` for `ask-questions`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, `remember-context`, and `adapt`. - Use initial skill version `1.0.0`. - Include frontmatter fields required by local skill conventions. - Define each skill's purpose, trigger cases, non-trigger cases, workflow, output expectations, error paths, and verification guidance where relevant. @@ -49,7 +49,7 @@ Source documents: ```gherkin Scenario: Create standalone skill instructions - Given the approved SPEC defines eight new general skills + Given the approved SPEC defines nine new general skills When the agent creates the new skill folders and SKILL.md files Then each new skill folder exists under .agents/skills/ And each SKILL.md includes name, description, license, version, tags, author, and metadata frontmatter @@ -82,6 +82,7 @@ Scenario: Avoid placeholder support folders 6. `.agents/skills/decide-direction/SKILL.md` -> New decision support skill. 7. `.agents/skills/coordinate-work/SKILL.md` -> New coordination skill. 8. `.agents/skills/remember-context/SKILL.md` -> New durable memory skill. + 9. `.agents/skills/adapt/SKILL.md` -> New evidence-driven adaptation diagnosis and routing skill. - **Shared Dependencies/Imports:** - Follow `.agents/skills/create-skill/references/authoring.md`. - Use [SPEC.md](../SPEC.md) as the implementation contract. @@ -94,7 +95,7 @@ Scenario: Avoid placeholder support folders *Note to Agent: Execute these steps sequentially. Verify state after each step.* 1. **Analyze & Validate:** Read [SPEC.md](../SPEC.md), `.agents/skills/create-skill/SKILL.md`, and `.agents/skills/create-skill/references/authoring.md`. -2. **Create Skill Folders:** Create only the eight missing skill directories and required files. +2. **Create Skill Folders:** Create only the nine missing skill directories and required files. 3. **Author Skill Instructions:** Write focused `SKILL.md` files with explicit trigger and non-trigger behavior. 4. **Check Runtime Boundaries:** Search new skill files for runtime dependency language that points to another skill. 5. **Check Folder Hygiene:** Confirm no placeholder support folders were created. diff --git a/docs/2026-05-20-general-agent-skills/user-stories/US-002-generate-skill-evals.md b/docs/2026-05-20-general-agent-skills/user-stories/US-002-generate-skill-evals.md index 743831b..1a33f95 100644 --- a/docs/2026-05-20-general-agent-skills/user-stories/US-002-generate-skill-evals.md +++ b/docs/2026-05-20-general-agent-skills/user-stories/US-002-generate-skill-evals.md @@ -14,7 +14,7 @@ Source documents: - **Persona:** As a skill library maintainer, - **Action:** I want representative evals generated for each general skill, - **Outcome:** so that trigger behavior and near-miss boundaries can be reviewed before release. -- **Epic Context:** Implements FR-011 from the approved SPEC. Evals are generated through `.agents/skills/create-skill/` and stored inside each skill folder. +- **Epic Context:** Implements FR-012 from the approved SPEC. Evals are generated through `.agents/skills/create-skill/` and stored inside each skill folder. --- @@ -43,7 +43,7 @@ Source documents: ```gherkin Scenario: Generate evals for each skill - Given the eight new general skills exist + Given the nine new general skills exist When the agent generates evals through create-skill conventions Then each new skill has evals/evals.json And each eval file contains 8-10 realistic prompts where possible @@ -77,6 +77,7 @@ Scenario: Preserve eval folder discipline 6. `.agents/skills/decide-direction/evals/evals.json` -> Trigger and output evals. 7. `.agents/skills/coordinate-work/evals/evals.json` -> Trigger and output evals. 8. `.agents/skills/remember-context/evals/evals.json` -> Trigger and output evals. + 9. `.agents/skills/adapt/evals/evals.json` -> Trigger and output evals. - **Shared Dependencies/Imports:** - Follow `.agents/skills/create-skill/references/evaluation.md`. - Use boundary distinctions from [SPEC.md](../SPEC.md).