diff --git a/.agents/skills/README.md b/.agents/skills/README.md index 8365c35..87f0e7f 100644 --- a/.agents/skills/README.md +++ b/.agents/skills/README.md @@ -8,16 +8,24 @@ A complete skill is a directory with a required `SKILL.md` file and optional bun | Skill | Use it for | Notable resources | | --- | --- | --- | +| [`ask-questions`](ask-questions/SKILL.md) | Generating high-leverage questions, clarifying missing context, and surfacing assumptions. | [`evals/`](ask-questions/evals/) | | [`audit-skill-security`](audit-skill-security/SKILL.md) | Auditing third-party or local skills before installing, updating, or trusting them. | [`references/audit-protocol.md`](audit-skill-security/references/audit-protocol.md) | +| [`classify-content`](classify-content/SKILL.md) | Organizing material into meaningful groups by criteria, similarity, priority, dependency, or abstraction level. | [`evals/`](classify-content/evals/) | | [`build-backend`](build-backend/SKILL.md) | Production backend code: APIs, services, middleware, workers, persistence, validation, auth, and backend tests. | [`references/`](build-backend/references/), [`evals/`](build-backend/evals/) | | [`build-database`](build-database/SKILL.md) | Database code: schemas, DDL, OLTP SQL, analytics SQL, migrations, indexes, stored procedures, and dialect-specific scripts. | [`references/`](build-database/references/), [`evals/`](build-database/evals/) | | [`build-frontend`](build-frontend/SKILL.md) | Production frontend code: components, routes, client state, forms, styling, accessibility, performance, PWA behavior, and visualization. | [`references/`](build-frontend/references/), [`evals/`](build-frontend/evals/) | | [`write-tests`](write-tests/SKILL.md) | Automated tests and evals, including E2E, API, integration, performance, AI output, tool-use, RAG, and prompt regression suites. | [`references/`](write-tests/references/), [`scripts/`](write-tests/scripts/), [`evals/`](write-tests/evals/) | +| [`coordinate-work`](coordinate-work/SKILL.md) | Managing active work across people, agents, tasks, dependencies, blockers, status, and handoffs. | [`evals/`](coordinate/evals/) | | [`create-rule`](create-rule/SKILL.md) | Writing or improving agent rules, instruction files, `AGENTS.md`, `CLAUDE.md`, Cursor rules, Copilot instructions, and `.agents/rules/*.md`. | [`scripts/`](create-rule/scripts/), [`evals/`](create-rule/evals/) | | [`create-skill`](create-skill/SKILL.md) | Creating, editing, reviewing, evaluating, packaging, optimizing, or improving skills. Start here for skill authoring. | [`references/`](create-skill/references/), [`scripts/`](create-skill/scripts/), [`eval-viewer/`](create-skill/eval-viewer/), [`agents/`](create-skill/agents/), [`assets/`](create-skill/assets/), [`evals/`](create-skill/evals/) | +| [`decide-direction`](decide-direction/SKILL.md) | Comparing options, weighing tradeoffs, and recommending a direction using explicit criteria. | [`evals/`](decide-direction/evals/) | | [`design-api`](design-api/SKILL.md) | Contract-first API design for OpenAPI, AsyncAPI, GraphQL, endpoints, schemas, and request/response shapes. | [`references/`](design-api/references/), [`evals/`](design-api/evals/) | -| [`explain`](explain/SKILL.md) | Explaining general knowledge, concepts, code, behavior, design, architecture, APIs, data flow, and tradeoffs in simple terms. | [`evals/`](explain/evals/) | +| [`explain-topic`](explain-topic/SKILL.md) | Explaining general knowledge, concepts, code, behavior, design, architecture, APIs, data flow, and tradeoffs in simple terms. | [`evals/`](explain-topic/evals/) | +| [`explore-context`](explore-context/SKILL.md) | Investigating local repository, project document, and attached-artifact context with evidence. | [`evals/`](explore-context/evals/) | | [`manage-git`](manage-git/SKILL.md) | Git branch naming, branch actions, commit-message drafting, and committing staged changes. | [`references/`](manage-git/references/), [`evals/`](manage-git/evals/) | +| [`plan-work`](plan-work/SKILL.md) | Sequencing work before execution with phases, dependencies, risks, validation, and next actions. | [`evals/`](plan-work/evals/) | +| [`reason-problem`](reason-problem/SKILL.md) | Working through ambiguous problems, assumptions, hypotheses, and problem framing before deciding or planning. | [`evals/`](reason-problem/evals/) | +| [`remember-context`](remember-context/SKILL.md) | Preserving durable project facts, decisions, conventions, and useful observations in `.agents/memory/`. | [`evals/`](remember-context/evals/) | | [`review-code`](review-code/SKILL.md) | Reviewing code changes, diffs, pull requests, branches, or patches for correctness, regressions, security, performance, and test gaps. | [`references/`](review-code/references/), [`evals/`](review-code/evals/) | | [`write-prd`](write-prd/SKILL.md) | Product requirements, product briefs, feature requirements, product scope, and launch requirements. | [`references/`](write-prd/references/), [`evals/`](write-prd/evals/) | | [`write-spec`](write-spec/SKILL.md) | Technical specs, design docs, functional and non-functional requirements, data contracts, UI specs, release specs, and handoff docs. | [`references/`](write-spec/references/), [`evals/`](write-spec/evals/) | diff --git a/.agents/skills/ask-questions/SKILL.md b/.agents/skills/ask-questions/SKILL.md new file mode 100644 index 0000000..bf4d42b --- /dev/null +++ b/.agents/skills/ask-questions/SKILL.md @@ -0,0 +1,70 @@ +--- +name: ask-questions +description: Generate high-leverage questions and clarify missing context. Use for "ask", "what should I ask", "right questions", "what are we missing", "clarify this", and ambiguous requests blocked by unknowns. +license: MIT +version: 1.0.0 +tags: + - ask + - clarification + - questions +author: Oleg Shulyakov +metadata: + catalog: utility +--- + +# ask-questions + +Generate the smallest useful set of questions that would change the next action. + +## Scope + +**Use this skill when the useful output is questions, assumptions, or missing context.** + +- **Trigger on clarification**: use for requests such as "what should I ask", "what are the right questions", "what are we missing", "clarify this", and ambiguous requests where progress depends on missing context. +- **Surface gaps**: identify unknown goals, constraints, stakeholders, acceptance criteria, data, ownership, risks, and decision criteria. +- **Stay question-first**: do not make decisions, produce implementation plans, or change files as the primary output. +- **Avoid questionnaires**: ask only the few questions likely to affect the next move unless the user explicitly requests a full discovery list. + +--- + +## Workflow + +**Prioritize questions by how much they reduce uncertainty or rework.** + +1. Restate the blocked decision or task in one sentence when helpful. +2. Separate known facts from assumptions and missing context. +3. Select the fewest questions that would materially change the next action. +4. Order questions by leverage, dependency, or urgency. +5. Add default assumptions only when they let work proceed despite unanswered questions. + +--- + +## Output + +**Return questions the user can answer or send to someone else without cleanup.** + +- **Lead with the core gap**: name the uncertainty that makes the questions necessary. +- **Use short lists when useful**: prefer three to seven prioritized questions for normal work. +- **Group only when needed**: use categories such as goal, scope, risk, data, owner, and acceptance criteria only if they improve scanability. +- **Mark blockers**: distinguish must-answer questions from nice-to-have questions. +- **Include assumptions sparingly**: list assumptions only when they affect the question set or proposed next step. + +--- + +## Error Paths + +**When the request is too broad, narrow the questions instead of expanding endlessly.** + +- **No clear domain**: ask one question about the intended context before generating a detailed set. +- **Too many unknowns**: provide a first-pass discovery set and name what would refine it. +- **User asks for action too**: answer the question-generation part first, then state what can proceed after the answers. + +--- + +## Verification + +**Check that every question earns its place.** + +- **Remove decorative questions**: delete questions whose answer would not change scope, approach, risk, or acceptance. +- **Check boundaries**: if the output is mainly a recommendation, plan, explanation, or classification, this skill is no longer the right mode. +- **Preserve uncertainty**: do not present assumptions as facts. diff --git a/.agents/skills/ask-questions/evals/evals.json b/.agents/skills/ask-questions/evals/evals.json new file mode 100644 index 0000000..78e65eb --- /dev/null +++ b/.agents/skills/ask-questions/evals/evals.json @@ -0,0 +1,60 @@ +{ + "evals": [ + { + "id": "ask-questions-001", + "category": "true-positive", + "prompt": "What should I ask-questions the team before we commit to this migration approach?", + "expected_trigger": "Trigger ask-questions.", + "expected_output": "Produces a short prioritized set of migration questions, with blockers separated from nice-to-have context." + }, + { + "id": "ask-questions-002", + "category": "true-positive", + "prompt": "Clarify this feature request and tell me what context is missing.", + "expected_trigger": "Trigger ask-questions.", + "expected_output": "Identifies missing goals, users, scope, acceptance criteria, and assumptions without planning implementation." + }, + { + "id": "ask-questions-003", + "category": "true-positive", + "prompt": "What are the right questions to ask before choosing a vendor?", + "expected_trigger": "Trigger ask-questions.", + "expected_output": "Returns high-leverage vendor selection questions ordered by decision impact." + }, + { + "id": "ask-questions-004", + "category": "true-positive", + "prompt": "We have an ambiguous request to improve onboarding. What are we missing?", + "expected_trigger": "Trigger ask-questions.", + "expected_output": "Surfaces missing user, workflow, metric, constraint, and success-context questions." + }, + { + "id": "ask-questions-005", + "category": "false-positive", + "prompt": "Decide whether we should build or buy the onboarding tool.", + "expected_trigger": "Do not trigger ask-questions as the primary skill.", + "expected_output": "Should compare options and recommend a direction rather than only generating questions." + }, + { + "id": "ask-questions-006", + "category": "false-positive", + "prompt": "Plan the onboarding migration in milestones.", + "expected_trigger": "Do not trigger ask-questions as the primary skill.", + "expected_output": "Should sequence work and include risks or verification; questions may appear only as blockers." + }, + { + "id": "ask-questions-007", + "category": "non-trigger", + "prompt": "Explain how OAuth refresh tokens work.", + "expected_trigger": "Do not trigger ask-questions.", + "expected_output": "Should explain the concept, not generate a discovery questionnaire." + }, + { + "id": "ask-questions-008", + "category": "non-trigger", + "prompt": "Update the README to document the install command.", + "expected_trigger": "Do not trigger ask-questions.", + "expected_output": "Should perform documentation work unless clarification is truly blocking." + } + ] +} diff --git a/.agents/skills/classify-content/SKILL.md b/.agents/skills/classify-content/SKILL.md new file mode 100644 index 0000000..a4b3bba --- /dev/null +++ b/.agents/skills/classify-content/SKILL.md @@ -0,0 +1,70 @@ +--- +name: classify-content +description: Organize material into meaningful groups. Use for "classify", "categorize", "group", "cluster", "sort", "taxonomy", "organize these", and grouping by criteria, priority, dependency, similarity, or abstraction level. +license: MIT +version: 1.0.0 +tags: + - classify + - taxonomy + - organization +author: Oleg Shulyakov +metadata: + catalog: utility +--- + +# classify-content + +Group material by explicit criteria while preserving edge cases. + +## Scope + +**Use this skill when the primary task is assigning items to meaningful groups.** + +- **Trigger on grouping**: use for "classify", "categorize", "group", "cluster", "sort", "taxonomy", "organize these", and requests to group items by explicit criteria. +- **Support many criteria**: group by similarity, difference, category, priority, dependency, abstraction level, user need, risk, ownership, or another stated lens. +- **Respect ambiguity**: keep multi-fit, unclear, or unclassified items visible instead of forcing false precision. +- **Do not decide by default**: classification may inform a decision, but the primary output is labeled organization. + +--- + +## Workflow + +**State the grouping lens before assigning items.** + +1. Identify the items to classify and any user-provided criteria. +2. Define or infer the grouping criteria, marking inferred criteria as assumptions. +3. Create clear group labels with short definitions. +4. Place each item into one or more groups as appropriate. +5. Call out ambiguous, duplicate, out-of-scope, or unclassified items. + +--- + +## Output + +**Make the taxonomy easy to inspect and revise.** + +- **Lead with criteria**: state the grouping rule before or alongside the groups. +- **Use stable labels**: choose labels that describe the underlying reason items belong together. +- **Preserve source text**: keep item names recognizable unless normalization is requested. +- **Explain edge cases**: briefly note why ambiguous items are multi-fit or unresolved. +- **Offer refinements**: suggest a better lens only when the requested criteria produce weak groups. + +--- + +## Error Paths + +**When the items or criteria are unclear, classify what can be classified and isolate the rest.** + +- **No criteria provided**: infer a practical lens and state it as an assumption. +- **Too little item detail**: group by observable wording and list what context would improve accuracy. +- **Conflicting criteria**: choose the primary criterion first, then note secondary tags if useful. + +--- + +## Verification + +**Check for useful categories, not tidy-looking fiction.** + +- **Every group has a reason**: remove or merge groups whose distinction does not matter. +- **Every item is accounted for**: placed, multi-labeled, or explicitly unclassified. +- **Ambiguity remains visible**: do not hide uncertainty to make the table look clean. diff --git a/.agents/skills/classify-content/evals/evals.json b/.agents/skills/classify-content/evals/evals.json new file mode 100644 index 0000000..2d189ac --- /dev/null +++ b/.agents/skills/classify-content/evals/evals.json @@ -0,0 +1,60 @@ +{ + "evals": [ + { + "id": "classify-content-001", + "category": "true-positive", + "prompt": "Classify these feature requests by underlying user need.", + "expected_trigger": "Trigger classify-content.", + "expected_output": "States grouping criteria, labels groups, assigns requests, and flags ambiguous items." + }, + { + "id": "classify-content-002", + "category": "true-positive", + "prompt": "Group these incidents by root cause pattern and note any multi-fit cases.", + "expected_trigger": "Trigger classify-content.", + "expected_output": "Groups incidents by root cause pattern and preserves incidents that belong to multiple groups." + }, + { + "id": "classify-content-003", + "category": "true-positive", + "prompt": "Build a taxonomy for these support tickets.", + "expected_trigger": "Trigger classify-content.", + "expected_output": "Creates clear category labels and places tickets into the taxonomy." + }, + { + "id": "classify-content-004", + "category": "true-positive", + "prompt": "Sort these tasks by dependency and priority.", + "expected_trigger": "Trigger classify-content.", + "expected_output": "Defines dependency and priority criteria, then organizes tasks accordingly." + }, + { + "id": "classify-content-005", + "category": "false-positive", + "prompt": "Decide which of these three projects we should fund first.", + "expected_trigger": "Do not trigger classify-content as the primary skill.", + "expected_output": "Should compare options and recommend a project using decision criteria." + }, + { + "id": "classify-content-006", + "category": "false-positive", + "prompt": "Plan the order for implementing these tasks.", + "expected_trigger": "Do not trigger classify-content as the primary skill.", + "expected_output": "Should produce an execution sequence, not just grouped categories." + }, + { + "id": "classify-content-007", + "category": "non-trigger", + "prompt": "Explain why the cache is invalidated on write.", + "expected_trigger": "Do not trigger classify-content.", + "expected_output": "Should explain behavior." + }, + { + "id": "classify-content-008", + "category": "non-trigger", + "prompt": "Investigate where the billing webhook is handled.", + "expected_trigger": "Do not trigger classify-content.", + "expected_output": "Should inspect local context and cite files." + } + ] +} diff --git a/.agents/skills/coordinate-work/SKILL.md b/.agents/skills/coordinate-work/SKILL.md new file mode 100644 index 0000000..2d4d5a6 --- /dev/null +++ b/.agents/skills/coordinate-work/SKILL.md @@ -0,0 +1,71 @@ +--- +name: coordinate-work +description: Manage active work across people, agents, tasks, dependencies, blockers, and handoffs. Use for "coordinate", "manage", "lead this", "assign", "delegate", "track blockers", "status", "handoff", and multi-workstream execution. +license: MIT +version: 1.0.0 +tags: + - coordinate-work + - execution + - handoff +author: Oleg Shulyakov +metadata: + catalog: utility +--- + +# coordinate-work + +Keep active work understandable across owners, dependencies, blockers, and handoffs. + +## Scope + +**Use this skill when execution is active or split across workstreams.** + +- **Trigger on coordination**: use for "coordinate, "manage", "team lead", "lead this", "assign", "delegate", "track blockers", "status", "handoff", and multi-agent or multi-workstream requests. +- **Track execution**: maintain goals, owners, dependencies, current status, blockers, decisions, and next actions. +- **Separate from planning**: planning sequences future work; coordination keeps active work moving and handoff-ready. +- **Do not invent authority**: do not silently assign real people without user-provided ownership or clearly stated assumptions. + +--- + +## Workflow + +**Maintain a compact execution view that another person or agent can resume from.** + +1. Identify the goal, active workstreams, stakeholders, and ownership. +2. Capture status for each workstream: not started, in progress, blocked, review, done, or unknown. +3. Map dependencies and blockers. +4. Define next actions with owner or assumed owner. +5. Update the view as new information arrives. +6. Produce handoff notes when work pauses or transfers. + +--- + +## Output + +**Make status, ownership, blockers, and next actions explicit.** + +- **Lead with current state**: summarize whether work is on track, blocked, or needs a decision. +- **Use an execution table when useful**: include workstream, owner, status, blocker, dependency, and next action. +- **Separate assumptions**: mark assumed owners, priorities, deadlines, or statuses. +- **Preserve handoff state**: include enough context for continuation without rereading the whole thread. +- **Avoid over-documenting**: keep the view proportional to the number of workstreams. + +--- + +## Error Paths + +**When ownership or status is unclear, expose the gap and keep work moving where possible.** + +- **Unknown owners**: use "unassigned" or "assumed owner" instead of inventing responsibility. +- **Blocked work**: name the blocker, impact, and unblock action. +- **Conflicting updates**: keep the latest known state and identify the conflict. + +--- + +## Verification + +**Check that another capable person could continue from the coordination view.** + +- **Every active stream has a next action**: done or blocked streams should say why. +- **Dependencies are visible**: downstream work should show what it waits on. +- **Handoff is concrete**: include open decisions, files, commands, artifacts, and validation status when relevant. diff --git a/.agents/skills/coordinate-work/evals/evals.json b/.agents/skills/coordinate-work/evals/evals.json new file mode 100644 index 0000000..fa342ef --- /dev/null +++ b/.agents/skills/coordinate-work/evals/evals.json @@ -0,0 +1,60 @@ +{ + "evals": [ + { + "id": "coordinate-work-001", + "category": "true-positive", + "prompt": "Coordinate this migration across frontend, backend, and QA.", + "expected_trigger": "Trigger coordinate-work.", + "expected_output": "Creates an execution view with workstreams, owners or assumptions, dependencies, blockers, and next actions." + }, + { + "id": "coordinate-work-002", + "category": "true-positive", + "prompt": "Lead this work and keep track of blockers and handoffs.", + "expected_trigger": "Trigger coordinate-work.", + "expected_output": "Tracks active status, blockers, handoff state, and next actions without inventing authority." + }, + { + "id": "coordinate-work-003", + "category": "true-positive", + "prompt": "Assign the three investigation tasks to the available agents and track progress.", + "expected_trigger": "Trigger coordinate-work.", + "expected_output": "Separates tasks, owners, statuses, dependencies, blockers, and progress updates." + }, + { + "id": "coordinate-work-004", + "category": "true-positive", + "prompt": "Give me a status view for the active release workstreams.", + "expected_trigger": "Trigger coordinate-work.", + "expected_output": "Summarizes current state by workstream with blockers and next steps." + }, + { + "id": "coordinate-work-005", + "category": "false-positive", + "prompt": "Plan the release from scratch before anyone starts working.", + "expected_trigger": "Do not trigger coordinate-work as the primary skill.", + "expected_output": "Should sequence future work as a plan rather than manage active status." + }, + { + "id": "coordinate-work-006", + "category": "false-positive", + "prompt": "What questions should I ask-questions each team before assigning work?", + "expected_trigger": "Do not trigger coordinate-work as the primary skill.", + "expected_output": "Should generate clarification questions before coordination." + }, + { + "id": "coordinate-work-007", + "category": "non-trigger", + "prompt": "Explain why this rollout needs a feature flag.", + "expected_trigger": "Do not trigger coordinate-work.", + "expected_output": "Should explain the reasoning or mechanism." + }, + { + "id": "coordinate-work-008", + "category": "non-trigger", + "prompt": "Categorize these tasks by frontend, backend, and docs.", + "expected_trigger": "Do not trigger coordinate-work.", + "expected_output": "Should classify-content tasks by component." + } + ] +} diff --git a/.agents/skills/decide-direction/SKILL.md b/.agents/skills/decide-direction/SKILL.md new file mode 100644 index 0000000..8414f1e --- /dev/null +++ b/.agents/skills/decide-direction/SKILL.md @@ -0,0 +1,71 @@ +--- +name: decide-direction +description: Compare options and recommend a direction. Use for "decide", "choose", "which option", "tradeoffs", "recommend", "should we", and option selection with criteria, risks, and reversibility. +license: MIT +version: 1.0.0 +tags: + - decide- + - recommendation + - tradeoffs +author: Oleg Shulyakov +metadata: + catalog: utility +--- + +# decide-direction + +Choose a direction by comparing viable options against explicit criteria. + +## Scope + +**Use this skill when the user wants a recommendation or choice among options.** + +- **Trigger on selection**: use for "decide", "choose", "which option", "tradeoffs", "recommend", "should we", and similar decision requests. +- **State criteria**: compare options against goals, constraints, risk, cost, speed, reversibility, maintenance, user impact, or user-provided criteria. +- **Recommend when supported**: choose one option when evidence is sufficient, and say when it is not. +- **Do not just classify-content**: grouping options is useful only as support for a decision. + +--- + +## Workflow + +**Make the decision criteria visible before the recommendation.** + +1. Identify the decision, options, and constraints. +2. Define criteria, prioritizing user-provided criteria over inferred ones. +3. Remove non-viable options with brief reasons. +4. Compare viable options against the criteria. +5. Recommend a direction, including assumptions, risks, tradeoffs, and reversibility. +6. Name what evidence would change the decision. + +--- + +## Output + +**Give the user a clear recommendation they can accept, reject, or revise.** + +- **Lead with the recommendation**: state the choice when the evidence supports one. +- **Show the basis**: include criteria and concise option comparison. +- **Name tradeoffs**: explain what the recommendation gives up. +- **State reversibility**: note whether the choice is easy to change later. +- **Handle ties honestly**: recommend a tie-breaker or next evidence step when options remain balanced. + +--- + +## Error Paths + +**When criteria or evidence are missing, make the uncertainty part of the decision.** + +- **No criteria**: infer practical criteria and label them as assumptions. +- **No options**: define plausible options before comparing them. +- **Insufficient evidence**: provide a conditional recommendation and the smallest information needed to firm it up. + +--- + +## Verification + +**Check that the recommendation follows from the comparison.** + +- **Criteria alignment**: the selected option should win on the criteria that matter most. +- **No hidden values**: surface subjective preferences and uncertain assumptions. +- **Risk visibility**: include meaningful risks, mitigations, and reversibility. diff --git a/.agents/skills/decide-direction/evals/evals.json b/.agents/skills/decide-direction/evals/evals.json new file mode 100644 index 0000000..88ecdf4 --- /dev/null +++ b/.agents/skills/decide-direction/evals/evals.json @@ -0,0 +1,60 @@ +{ + "evals": [ + { + "id": "decide-direction-001", + "category": "true-positive", + "prompt": "Decide whether we should build this as a plugin or a skill.", + "expected_trigger": "Trigger decide-direction.", + "expected_output": "States criteria, compares plugin versus skill, recommends one, and notes tradeoffs." + }, + { + "id": "decide-direction-002", + "category": "true-positive", + "prompt": "Which option should we choose for the cache: Redis, Postgres, or in-memory?", + "expected_trigger": "Trigger decide-direction.", + "expected_output": "Compares viable options against criteria and recommends a direction or tie-breaker." + }, + { + "id": "decide-direction-003", + "category": "true-positive", + "prompt": "Recommend whether we should ship this now or wait for the analytics work.", + "expected_trigger": "Trigger decide-direction.", + "expected_output": "Evaluates timing tradeoffs, risks, assumptions, and reversibility before recommending." + }, + { + "id": "decide-direction-004", + "category": "true-positive", + "prompt": "Should we optimize for implementation speed or long-term extensibility here?", + "expected_trigger": "Trigger decide-direction.", + "expected_output": "Defines decision criteria and recommends based on the stated context." + }, + { + "id": "decide-direction-005", + "category": "false-positive", + "prompt": "Classify these options by risk level without recommending one.", + "expected_trigger": "Do not trigger decide-direction as the primary skill.", + "expected_output": "Should organize options by risk category without forcing a recommendation." + }, + { + "id": "decide-direction-006", + "category": "false-positive", + "prompt": "Think through what might be causing users to abandon setup.", + "expected_trigger": "Do not trigger decide-direction as the primary skill.", + "expected_output": "Should reason through hypotheses and uncertainty rather than choose an option." + }, + { + "id": "decide-direction-007", + "category": "non-trigger", + "prompt": "Find where setup completion is tracked in the repo.", + "expected_trigger": "Do not trigger decide-direction.", + "expected_output": "Should inspect local files and cite evidence." + }, + { + "id": "decide-direction-008", + "category": "non-trigger", + "prompt": "Write a user story for password reset.", + "expected_trigger": "Do not trigger decide-direction.", + "expected_output": "Should produce a user story, not a recommendation." + } + ] +} diff --git a/.agents/skills/explain/SKILL.md b/.agents/skills/explain-topic/SKILL.md similarity index 99% rename from .agents/skills/explain/SKILL.md rename to .agents/skills/explain-topic/SKILL.md index a35c45b..7fce122 100644 --- a/.agents/skills/explain/SKILL.md +++ b/.agents/skills/explain-topic/SKILL.md @@ -1,5 +1,5 @@ --- -name: explain +name: explain-topic description: Explain any knowledge topic simply and accurately. Use for "explain X", "why/how/what is X?", concepts, science, definitions, code, design, architecture, and walkthroughs. license: MIT version: 2.1.0 @@ -12,7 +12,7 @@ metadata: catalog: utility --- -# explain +# explain-topic Explain knowledge questions clearly, accurately, and at the right depth. Use simple language first, then add precision only where it helps the user understand. diff --git a/.agents/skills/explain/evals/evals.json b/.agents/skills/explain-topic/evals/evals.json similarity index 90% rename from .agents/skills/explain/evals/evals.json rename to .agents/skills/explain-topic/evals/evals.json index 9939557..8a226a1 100644 --- a/.agents/skills/explain/evals/evals.json +++ b/.agents/skills/explain-topic/evals/evals.json @@ -1,5 +1,5 @@ { - "skill_name": "explain", + "skill_name": "explain-topic", "evals": [ { "id": 1, @@ -9,7 +9,7 @@ "expected_output": "Explains Rayleigh scattering in simple terms, gives the short answer first, and avoids unnecessary technical depth.", "files": [], "expectations": [ - "Triggers the explain skill for a general knowledge why question", + "Triggers the explain-topic skill for a general knowledge why question", "Starts with a concise answer", "Explains sunlight scattering by air molecules", "Mentions shorter blue wavelengths scattering more than red wavelengths", @@ -24,7 +24,7 @@ "expected_output": "Defines planet clearly, gives the core criteria, and notes context where definitions vary.", "files": [], "expectations": [ - "Triggers the explain skill for a definition question", + "Triggers the explain-topic skill for a definition question", "Gives a plain-language definition first", "Explains orbiting a star and being rounded by gravity", "Mentions clearing the orbital neighborhood when using the IAU Solar System definition", @@ -39,7 +39,7 @@ "expected_output": "Explains both evolutionary reasons and the physical mechanism of flight in simple language.", "files": [], "expectations": [ - "Triggers the explain skill for a general biology question", + "Triggers the explain-topic skill for a general biology question", "Explains reasons such as food, escape, migration, and nesting", "Explains wings, lift, thrust, and lightweight bodies at a simple level", "Separates why birds evolved flight from how flight works", @@ -54,7 +54,7 @@ "expected_output": "Gives the exact vacuum speed of light and briefly explains why it matters.", "files": [], "expectations": [ - "Triggers the explain skill for a factual knowledge question", + "Triggers the explain-topic skill for a factual knowledge question", "States 299,792,458 meters per second in vacuum", "Mentions that light slows in materials", "Briefly explains its role as a fundamental speed limit", @@ -78,13 +78,13 @@ }, { "id": 6, - "query": "/explain user signup", - "prompt": "/explain user signup", + "query": "/explain-topic user signup", + "prompt": "/explain-topic user signup", "should_trigger": true, - "expected_output": "Treats /explain as an explanation request, recognizes signup as a code walkthrough topic, traces the signup path, and explains how data moves through the system.", + "expected_output": "Treats /explain-topic as an explanation request, recognizes signup as a code walkthrough topic, traces the signup path, and explains how data moves through the system.", "files": [], "expectations": [ - "Triggers the explain skill from the /explain phrasing", + "Triggers the explain-topic skill from the /explain-topic phrasing", "Starts from likely signup entry points", "Explains data flow from input through persistence or side effects", "Mentions tests or fixtures when found", @@ -200,10 +200,10 @@ "query": "Implement a new endpoint for exporting audit logs.", "prompt": "Implement a new endpoint for exporting audit logs.", "should_trigger": false, - "expected_output": "Does not route to explain; treats the request as implementation work rather than an explanation or architecture critique.", + "expected_output": "Does not route to explain-topic; treats the request as implementation work rather than an explanation or architecture critique.", "files": [], "expectations": [ - "Does not produce an explain-style architecture walkthrough as the main output", + "Does not produce an explain-topic-style architecture walkthrough as the main output", "Inspects implementation conventions before editing", "Routes to a code generation skill or normal implementation workflow", "Adds or updates code and tests when appropriate" diff --git a/.agents/skills/explore-context/SKILL.md b/.agents/skills/explore-context/SKILL.md new file mode 100644 index 0000000..e123022 --- /dev/null +++ b/.agents/skills/explore-context/SKILL.md @@ -0,0 +1,70 @@ +--- +name: explore-context +description: Investigate local repository, document, and attached-artifact context. Use for "explore", "investigate", "find where", "understand this repo", "trace", and local-context research; do not use for web search. +license: MIT +version: 1.0.0 +tags: + - explore + - investigation + - local-context +author: Oleg Shulyakov +metadata: + catalog: utility +--- + +# explore-context + +Investigate local context and report evidence-backed findings. + +## Scope + +**Use this skill for local repository, project document, and attached-artifact investigation.** + +- **Trigger on local research**: use for "explore", "investigate", "find where", "understand this repo", "trace", and requests to inspect local project context. +- **Stay local**: search local files, project docs, attached artifacts, repository history, and available workspace context only. +- **Exclude web research**: do not perform web search, browsing, or current-information research as part of this skill. +- **Report evidence**: ground findings in file references, artifact references, command output, or clearly marked inference. + +--- + +## Workflow + +**Start broad enough to find entry points, then narrow to evidence.** + +1. Identify the target concept, behavior, file, command, error, or workflow. +2. Search names, strings, docs, tests, configuration, and related symbols. +3. Follow call paths, imports, references, generated sources, and tests only as needed. +4. Distinguish live behavior from dead code, examples, fixtures, or stale docs. +5. Summarize findings, gaps, and confidence with references. + +--- + +## Output + +**Make findings traceable and useful for the next action.** + +- **Lead with the answer**: state what was found or not found. +- **Cite local evidence**: include file paths, line references when available, and relevant commands. +- **Separate inference**: label deductions that are not directly stated in files. +- **Name gaps**: call out missing files, inaccessible artifacts, ambiguous ownership, or unverified runtime behavior. +- **Keep scope tight**: do not explain unrelated systems discovered during the search. + +--- + +## Error Paths + +**When evidence is incomplete, report the limits rather than filling gaps with guesses.** + +- **No matches**: say what was searched and suggest the next local search path. +- **Conflicting sources**: prefer runtime wiring and tests over stale docs, and state the conflict. +- **Generated or external code missing**: identify the missing source and how it affects confidence. + +--- + +## Verification + +**Check that every conclusion has local support.** + +- **Reproduce key searches**: use fast local search before relying on memory. +- **Prefer primary files**: cite implementation, tests, configs, or authoritative docs over secondary mentions. +- **No external claims**: leave web or current-information research to an explicit user request outside this skill. diff --git a/.agents/skills/explore-context/evals/evals.json b/.agents/skills/explore-context/evals/evals.json new file mode 100644 index 0000000..cfb8f2b --- /dev/null +++ b/.agents/skills/explore-context/evals/evals.json @@ -0,0 +1,60 @@ +{ + "evals": [ + { + "id": "explore-context-001", + "category": "true-positive", + "prompt": "Explore whether this repo already supports SSO.", + "expected_trigger": "Trigger explore-context.", + "expected_output": "Searches local files and docs, reports findings with file references, and marks gaps." + }, + { + "id": "explore-context-002", + "category": "true-positive", + "prompt": "Investigate where the billing webhook is handled.", + "expected_trigger": "Trigger explore-context.", + "expected_output": "Uses local search and traces relevant files or handlers with evidence." + }, + { + "id": "explore-context-003", + "category": "true-positive", + "prompt": "Find where we define the retry policy and how it is used.", + "expected_trigger": "Trigger explore-context.", + "expected_output": "Finds definitions and call sites, distinguishes live behavior from references, and cites files." + }, + { + "id": "explore-context-004", + "category": "true-positive", + "prompt": "Trace the local code path for user deletion.", + "expected_trigger": "Trigger explore-context.", + "expected_output": "Traces repository code paths with local evidence and uncertainty where needed." + }, + { + "id": "explore-context-005", + "category": "false-positive", + "prompt": "Search the web for the latest pricing of this API provider.", + "expected_trigger": "Do not trigger explore-context.", + "expected_output": "Should use a web/current-information workflow, not local-only exploration." + }, + { + "id": "explore-context-006", + "category": "false-positive", + "prompt": "Explain how the retry policy works conceptually.", + "expected_trigger": "Do not trigger explore-context as the primary skill.", + "expected_output": "Should explain the concept; local inspection is only needed if repo behavior is requested." + }, + { + "id": "explore-context-007", + "category": "non-trigger", + "prompt": "Plan a gradual rollout for SSO.", + "expected_trigger": "Do not trigger explore-context.", + "expected_output": "Should produce phased planning guidance." + }, + { + "id": "explore-context-008", + "category": "non-trigger", + "prompt": "Decide whether SSO should be SAML or OIDC for our customers.", + "expected_trigger": "Do not trigger explore-context.", + "expected_output": "Should compare options and recommend using criteria." + } + ] +} diff --git a/.agents/skills/plan-work/SKILL.md b/.agents/skills/plan-work/SKILL.md new file mode 100644 index 0000000..3f7273c --- /dev/null +++ b/.agents/skills/plan-work/SKILL.md @@ -0,0 +1,71 @@ +--- +name: plan-work +description: Sequence work before execution. Use for "plan", "break this down", "roadmap", "approach", "milestones", "how should we proceed", migration planning, rollout planning, and scoped next steps. +license: MIT +version: 1.0.0 +tags: + - plan + - roadmap + - sequencing +author: Oleg Shulyakov +metadata: + catalog: utility +--- + +# plan-work + +Turn a goal into a practical sequence of work. + +## Scope + +**Use this skill when the user wants an approach, roadmap, milestones, or next-step sequence before execution.** + +- **Trigger on planning**: use for "plan", "break this down", "roadmap", "approach", "milestones", and "how should we proceed". +- **Sequence work**: identify phases, dependencies, assumptions, risks, verification, and immediate next actions. +- **Stay pre-execution**: do not manage live owners, blockers, or handoffs as the primary behavior. +- **Default conversationally**: create durable files only when the user asks or the work clearly needs durable task documentation. + +--- + +## Workflow + +**Build the plan around dependencies and validation.** + +1. Define the goal, scope, and success condition. +2. Identify constraints, assumptions, dependencies, and unknowns. +3. Break work into ordered phases or steps. +4. Add risks and mitigation where failure would be costly. +5. Define verification for each meaningful phase. +6. End with the next concrete action. + +--- + +## Output + +**Use the lightest structure that makes the sequence executable.** + +- **Lead with the approach**: state the overall strategy in one short paragraph. +- **Use phases for larger work**: include purpose, key tasks, dependencies, and validation. +- **Keep steps scoped**: each step should have a visible outcome. +- **Flag blockers**: name missing context that prevents a reliable plan. +- **Avoid fake precision**: do not invent owners, dates, or estimates without evidence. + +--- + +## Error Paths + +**When context is thin, produce a conditional plan instead of pretending the path is fixed.** + +- **Unclear goal**: ask-questions the one question that most affects scope, then provide a provisional outline if useful. +- **Many unknowns**: split discovery from execution and identify what must be learned first. +- **Execution already active**: switch the output toward status, blockers, and handoff only when the user asks to manage ongoing work. + +--- + +## Verification + +**Check that the plan can guide action without overfitting to guesses.** + +- **Trace dependencies**: verify that later steps do not require missing earlier outputs. +- **Define done**: include validation or acceptance checks for non-trivial work. +- **Keep it current**: revise the plan when new facts change scope, risk, or sequence. diff --git a/.agents/skills/plan-work/evals/evals.json b/.agents/skills/plan-work/evals/evals.json new file mode 100644 index 0000000..11a82ff --- /dev/null +++ b/.agents/skills/plan-work/evals/evals.json @@ -0,0 +1,60 @@ +{ + "evals": [ + { + "id": "plan-work-001", + "category": "true-positive", + "prompt": "Plan the migration from the old auth service to the new one.", + "expected_trigger": "Trigger plan.", + "expected_output": "Produces phases, dependencies, risks, validation steps, and a next action." + }, + { + "id": "plan-work-002", + "category": "true-positive", + "prompt": "Break this feature into milestones we can implement safely.", + "expected_trigger": "Trigger plan.", + "expected_output": "Breaks work into scoped milestones with assumptions and verification." + }, + { + "id": "plan-work-003", + "category": "true-positive", + "prompt": "What's the best approach for rolling this out gradually?", + "expected_trigger": "Trigger plan.", + "expected_output": "Outlines a staged rollout with risks, gates, and rollback or validation checks." + }, + { + "id": "plan-work-004", + "category": "true-positive", + "prompt": "How should we proceed with cleaning up this module?", + "expected_trigger": "Trigger plan.", + "expected_output": "Defines a practical sequence and notes missing context that could change it." + }, + { + "id": "plan-work-005", + "category": "false-positive", + "prompt": "Coordinate frontend, backend, and QA while this migration is already in progress.", + "expected_trigger": "Do not trigger plan-work as the primary skill.", + "expected_output": "Should manage active workstreams, owners, blockers, and handoffs." + }, + { + "id": "plan-work-006", + "category": "false-positive", + "prompt": "What are the right questions before we plan the migration?", + "expected_trigger": "Do not trigger plan-work as the primary skill.", + "expected_output": "Should generate high-leverage clarification questions first." + }, + { + "id": "plan-work-007", + "category": "non-trigger", + "prompt": "Classify these API endpoints by customer-facing versus internal.", + "expected_trigger": "Do not trigger plan.", + "expected_output": "Should group endpoints by the requested criteria." + }, + { + "id": "plan-work-008", + "category": "non-trigger", + "prompt": "Remember that migration plans should include rollback checks.", + "expected_trigger": "Do not trigger plan.", + "expected_output": "Should record durable memory if appropriate." + } + ] +} diff --git a/.agents/skills/reason-problem/SKILL.md b/.agents/skills/reason-problem/SKILL.md new file mode 100644 index 0000000..9c5498c --- /dev/null +++ b/.agents/skills/reason-problem/SKILL.md @@ -0,0 +1,69 @@ +--- +name: reason-problem +description: Work through ambiguous problems before a firm output shape is warranted. Use for "reason through", "think through", "brainstorm", "help me frame this", "let's work through this", and messy problem statements. +license: MIT +version: 1.0.0 +tags: + - reason + - framing + - thinking +author: Oleg Shulyakov +metadata: + catalog: utility +--- + +# reason-problem + +Clarify messy problems without forcing a premature answer. + +## Scope + +**Use this skill when the user needs structured thinking more than a final decision or plan.** + +- **Trigger on ambiguity**: use for "reason through", "think through", "brainstorm", "tackle this problem", "help me frame this", "let's work through this", and unclear problem statements. +- **Frame the problem**: clarify terms, goals, constraints, assumptions, competing interpretations, hypotheses, and possible directions. +- **Keep uncertainty visible**: separate facts, assumptions, opinions, and open questions. +- **Avoid premature closure**: do not force a recommendation, step-by-step plan, or implementation unless the user asks for that next. + +--- + +## Workflow + +**Move from confusion to a sharper framing.** + +1. Identify the central tension, ambiguity, or decision pressure. +2. List known facts and explicitly mark assumptions. +3. Name plausible interpretations or hypotheses. +4. Test each direction against constraints, evidence, tradeoffs, and failure modes. +5. End with the clearest current framing and the next useful clarity step. + +--- + +## Output + +**Make the thinking trace useful without dumping private scratchwork.** + +- **Lead with framing**: state what the problem appears to be and why it is ambiguous. +- **Show useful structure**: use short sections such as facts, assumptions, hypotheses, tensions, and next clarity step when the problem is complex. +- **Keep options alive**: preserve viable competing explanations when evidence is thin. +- **Name confidence**: state when a view is strong, weak, subjective, or needs evidence. + +--- + +## Error Paths + +**When the user needs a different artifact, say so and produce the closest useful reasoning.** + +- **Missing context**: reason from available facts and identify what would change the framing. +- **Decision requested**: compare options and state that a recommendation depends on criteria when criteria are missing. +- **Planning requested**: outline reasoning about sequence and risks before turning it into steps only if asked. + +--- + +## Verification + +**Check that the answer improves clarity instead of sounding clever.** + +- **No false certainty**: do not hide uncertainty behind confident prose. +- **No generic brainstorming**: tie ideas to the user's constraints and evidence. +- **Clear exit**: end with a sharper framing, candidate direction, or next question. diff --git a/.agents/skills/reason-problem/evals/evals.json b/.agents/skills/reason-problem/evals/evals.json new file mode 100644 index 0000000..d2f84b1 --- /dev/null +++ b/.agents/skills/reason-problem/evals/evals.json @@ -0,0 +1,60 @@ +{ + "evals": [ + { + "id": "reason-problem-001", + "category": "true-positive", + "prompt": "Let's reason through why this workflow feels fragile before we change it.", + "expected_trigger": "Trigger reason-problem.", + "expected_output": "Frames the problem, separates facts from assumptions, names hypotheses, and ends with a clearer next clarity step." + }, + { + "id": "reason-problem-002", + "category": "true-positive", + "prompt": "Help me think through whether this product idea is actually one problem or three.", + "expected_trigger": "Trigger reason-problem.", + "expected_output": "Explores interpretations, constraints, and possible framings without forcing an immediate recommendation." + }, + { + "id": "reason-problem-003", + "category": "true-positive", + "prompt": "Brainstorm possible causes for the drop in activation, but keep uncertainty visible.", + "expected_trigger": "Trigger reason-problem.", + "expected_output": "Generates hypotheses, marks assumptions, and identifies evidence that would distinguish them." + }, + { + "id": "reason-problem-004", + "category": "true-positive", + "prompt": "This architecture debate is messy. Help me frame the core tension.", + "expected_trigger": "Trigger reason-problem.", + "expected_output": "Clarifies competing goals and tradeoffs, then states a sharper problem framing." + }, + { + "id": "reason-problem-005", + "category": "false-positive", + "prompt": "Choose the best database for this workload and recommend one.", + "expected_trigger": "Do not trigger reason-problem as the primary skill.", + "expected_output": "Should use decision criteria and produce a recommendation." + }, + { + "id": "reason-problem-006", + "category": "false-positive", + "prompt": "Break this migration into phases with validation steps.", + "expected_trigger": "Do not trigger reason-problem as the primary skill.", + "expected_output": "Should produce a plan rather than an open-ended reasoning frame." + }, + { + "id": "reason-problem-007", + "category": "non-trigger", + "prompt": "Classify these bugs by severity and affected component.", + "expected_trigger": "Do not trigger reason-problem.", + "expected_output": "Should group items by explicit criteria." + }, + { + "id": "reason-problem-008", + "category": "non-trigger", + "prompt": "Remember that we chose SQLite for local development.", + "expected_trigger": "Do not trigger reason-problem.", + "expected_output": "Should write durable memory if appropriate." + } + ] +} diff --git a/.agents/skills/remember-context/SKILL.md b/.agents/skills/remember-context/SKILL.md new file mode 100644 index 0000000..8f2bd8f --- /dev/null +++ b/.agents/skills/remember-context/SKILL.md @@ -0,0 +1,70 @@ +--- +name: remember-context +description: Preserve durable project facts, decisions, and useful observations in .agents/memory/. Use when the user asks to remember, save context, record a decision, update memory, or preserve a project fact. +license: MIT +version: 1.0.0 +tags: + - remember + - memory + - project-context +author: Oleg Shulyakov +metadata: + catalog: utility +--- + +# remember-context + +Write durable project memory only when it will help future work. + +## Scope + +**Use this skill when the user explicitly asks to preserve project context.** + +- **Trigger on memory requests**: use for "remember", "save context", "record this decision", "update memory", "preserve this", and similar requests. +- **Auto-approve explicit memory**: when the user clearly asks to remember something, write the memory without asking for separate confirmation. +- **Store durable value**: record project facts, decisions, conventions, recurring constraints, implementation observations, and useful handoff facts. +- **Reject low-value memory**: do not store transient chatter, todo noise, sensitive information, unverifiable assumptions as fact, or details already captured better in durable docs. + +--- + +## Workflow + +**Filter for durable usefulness before writing.** + +1. Identify the fact, decision, convention, or observation to preserve. +2. Check whether it is durable, project-relevant, and safe to store. +3. Inspect existing `.agents/memory/MEMORY.md` and the current dated memory file when needed to avoid duplication. +4. Write concise notes under the existing memory convention. +5. Report what was recorded and where. + +--- + +## Output + +**Keep memory entries brief, factual, and easy to reuse.** + +- **Use dated notes for task observations**: prefer `.agents/memory/YYYY-MM-DD.md` for day-specific implementation facts. +- **Use durable memory for stable facts**: use `.agents/memory/MEMORY.md` for ongoing project conventions or long-lived decisions when that file's structure supports it. +- **Mark uncertainty**: record assumptions as assumptions, not facts. +- **Avoid secrets**: do not store credentials, private tokens, personal sensitive data, or material the user did not intend to persist. +- **Avoid duplication**: link or summarize existing docs rather than copying large content. + +--- + +## Error Paths + +**When the requested memory is unsafe or not durable, explain the constraint and offer a safer note.** + +- **Sensitive content**: refuse to store secrets and suggest storing the location or policy instead. +- **Transient detail**: explain that it is not worth durable memory unless the user insists and it has future value. +- **Unverifiable claim**: record as "user stated" or ask one clarifying question if writing it as fact would mislead future work. + +--- + +## Verification + +**Confirm the memory is accurate, scoped, and non-duplicative.** + +- **Read before writing**: check relevant existing memory when practical. +- **Keep provenance clear**: distinguish observed repository facts from user-provided decisions. +- **Report the write**: tell the user the file updated and summarize the note. diff --git a/.agents/skills/remember-context/evals/evals.json b/.agents/skills/remember-context/evals/evals.json new file mode 100644 index 0000000..3483bb3 --- /dev/null +++ b/.agents/skills/remember-context/evals/evals.json @@ -0,0 +1,60 @@ +{ + "evals": [ + { + "id": "remember-context-001", + "category": "true-positive", + "prompt": "Remember that we chose skills over plugins for this workflow.", + "expected_trigger": "Trigger remember-context.", + "expected_output": "Writes a concise durable memory note without asking for extra confirmation." + }, + { + "id": "remember-context-002", + "category": "true-positive", + "prompt": "Save context: this repo keeps task-specific docs under docs/YYYY-MM-DD-task-name/.", + "expected_trigger": "Trigger remember-context.", + "expected_output": "Records the durable convention in the appropriate memory file if not already present." + }, + { + "id": "remember-context-003", + "category": "true-positive", + "prompt": "Record the decision that explicit memory requests are auto-approved.", + "expected_trigger": "Trigger remember-context.", + "expected_output": "Persists the decision as durable project memory and reports the file updated." + }, + { + "id": "remember-context-004", + "category": "true-positive", + "prompt": "Update memory with the fact that explore-context is local-only and never web search.", + "expected_trigger": "Trigger remember-context.", + "expected_output": "Writes a brief project fact while avoiding duplication if already stored." + }, + { + "id": "remember-context-005", + "category": "false-positive", + "prompt": "Plan how we should capture project decisions going forward.", + "expected_trigger": "Do not trigger remember-context as the primary skill.", + "expected_output": "Should produce a plan; memory writes require an explicit thing to preserve." + }, + { + "id": "remember-context-006", + "category": "false-positive", + "prompt": "Explain how the memory files are organized.", + "expected_trigger": "Do not trigger remember-context as the primary skill.", + "expected_output": "Should explain the convention, not write memory." + }, + { + "id": "remember-context-007", + "category": "non-trigger", + "prompt": "Here is my API token, keep it handy for later.", + "expected_trigger": "Do not write sensitive memory.", + "expected_output": "Should refuse to store secrets and suggest a safer handling pattern." + }, + { + "id": "remember-context-008", + "category": "non-trigger", + "prompt": "Find where the project memory loader is implemented.", + "expected_trigger": "Do not trigger remember-context.", + "expected_output": "Should inspect local repository context and cite evidence." + } + ] +} diff --git a/docs/2026-05-20-general-agent-skills/PRD.md b/docs/2026-05-20-general-agent-skills/PRD.md index f0b97b3..63cac3b 100644 --- a/docs/2026-05-20-general-agent-skills/PRD.md +++ b/docs/2026-05-20-general-agent-skills/PRD.md @@ -32,7 +32,7 @@ Without these skills, the agent has to infer these broad behaviors from generic | Goal ID | Target Outcome | Success Metric | | --- | --- | --- | -| G-1 | Provide a minimal general skill set for common agent collaboration modes. | Nine skills exist: `ask`, `explain`, `reason`, `classify`, `plan`, `explore`, `decide`, `coordinate`, and `remember`. | +| G-1 | Provide a minimal general skill set for common agent collaboration modes. | Nine skills exist: `ask-questions`, `explain-topic`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, and `remember-context`. | | G-2 | Keep every skill independently installable. | Each skill works at runtime without requiring, naming, or delegating to another skill. | | G-3 | Make trigger behavior predictable. | Each skill has explicit trigger phrases, exclusions, and at least 7 representative eval prompts: 3 true-positive, 2 false-positive, and 2 non-trigger prompts. | | G-4 | Keep each skill lightweight and reusable across repositories. | Each `SKILL.md` stays under 500 lines and uses references only when needed. | @@ -51,15 +51,15 @@ Without these skills, the agent has to infer these broad behaviors from generic ### In Scope -- `ask`: Identify useful questions, missing context, hidden assumptions, and the smallest clarifications needed to move forward. -- `explain`: Explain concepts, code, architecture, behavior, tradeoffs, and decisions in clear terms matched to the user's question. -- `reason`: Work through ambiguous problems by clarifying terms, surfacing assumptions, generating hypotheses, testing arguments, and shaping a clearer framing. -- `classify`: Organize items, ideas, observations, requirements, examples, files, risks, or options into meaningful categories by similarity, difference, type, abstraction level, priority, dependency, or other explicit criteria. -- `plan`: Turn a goal into a scoped plan, milestones, risks, sequencing, and next actions. -- `explore`: Investigate local code, project docs, attached artifacts, and repository context when local research is needed. -- `decide`: Compare options and recommend a course of action with tradeoffs, assumptions, and decision criteria. -- `coordinate`: Manage multi-step or multi-agent work by tracking goals, owners, dependencies, status, blockers, handoffs, and next actions. -- `remember`: Capture durable project facts, decisions, and useful observations in `.agents/memory/`. +- `ask-questions`: Identify useful questions, missing context, hidden assumptions, and the smallest clarifications needed to move forward. +- `explain-topic`: Explain concepts, code, architecture, behavior, tradeoffs, and decisions in clear terms matched to the user's question. +- `reason-problem`: Work through ambiguous problems by clarifying terms, surfacing assumptions, generating hypotheses, testing arguments, and shaping a clearer framing. +- `classify-content`: Organize items, ideas, observations, requirements, examples, files, risks, or options into meaningful categories by similarity, difference, type, abstraction level, priority, dependency, or other explicit criteria. +- `plan-work`: Turn a goal into a scoped plan, milestones, risks, sequencing, and next actions. +- `explore-context`: Investigate local code, project docs, attached artifacts, and repository context when local research is needed. +- `decide-direction`: Compare options and recommend a course of action with tradeoffs, assumptions, and decision criteria. +- `coordinate-work`: Manage multi-step or multi-agent work by tracking goals, owners, dependencies, status, blockers, handoffs, and next actions. +- `remember-context`: Capture durable project facts, decisions, and useful observations in `.agents/memory/`. - Trigger and exclusion guidance for each skill. - Acceptance criteria and eval prompts for skill behavior. - Standalone installation guidance for each skill. @@ -67,7 +67,7 @@ Without these skills, the agent has to infer these broad behaviors from generic ### Out of Scope - Live integrations with Jira, Linear, Confluence, GitHub Issues, or external memory stores. -- Web search, web browsing, or external/current-information research inside `explore`. +- Web search, web browsing, or external/current-information research inside `explore-context`. - Automatic memory writes without user intent or clearly durable project value. - Replacing project instructions in `AGENTS.md`. @@ -82,15 +82,15 @@ Without these skills, the agent has to infer these broad behaviors from generic | Requirement ID | Capability / Feature | Priority | Acceptance Criteria | Tracker | | --- | --- | --- | --- | --- | -| FR-1 | Define `ask` as the skill for question generation and clarification. | MUST | Triggers on “ask”, “what should I ask”, “what are the right questions”, “what are we missing”, “clarify this”, and ambiguous requests where progress depends on missing context. Produces a minimal, prioritized set of useful questions, assumptions, and context gaps. | TBD | -| FR-2 | Define `explain` as the skill for clarification and teaching. | MUST | Triggers on “explain”, “what is”, “why”, “how does”, “walk me through”, and code/concept explanation requests. Produces clear explanations matched to the user's context and desired depth. | TBD | -| FR-3 | Define `reason` as the skill for working through ambiguous problems. | MUST | Triggers on “reason through”, “think through”, “brainstorm”, “tackle this problem”, “help me frame this”, “let’s work through this”, and messy problem statements where the desired output is not yet clear. Clarifies terms, assumptions, constraints, possible explanations, and candidate directions without forcing a premature decision or plan. | TBD | -| FR-4 | Define `classify` as the skill for organizing material into meaningful groups. | MUST | Triggers on “classify”, “categorize”, “group”, “cluster”, “sort”, “taxonomy”, “organize these”, and requests to group items by similarity, difference, category, priority, dependency, abstraction level, or other explicit criteria. Produces labeled groups, grouping criteria, notable edge cases, and items that do not clearly fit. | TBD | -| FR-5 | Define `plan` as the skill for sequencing work before execution. | MUST | Triggers on “plan”, “break this down”, “roadmap”, “approach”, “milestones”, and “how should we proceed”. Produces scoped steps, risks, assumptions, and verification strategy when relevant. | TBD | -| FR-6 | Define `explore` as the skill for local investigation and repository research. | MUST | Triggers on “explore”, “investigate”, “find where”, “understand this repo”, “trace”, and local-context research requests. Covers local code, project docs, attached artifacts, and repository context only. Produces findings with file references, artifact references, or uncertainty clearly marked. | TBD | -| FR-7 | Define `decide` as the skill for choosing among options. | MUST | Triggers on “decide”, “choose”, “which option”, “tradeoffs”, “recommend”, and “should we”. States decision criteria, compares viable options, recommends one, and identifies reversibility or risk. | TBD | -| FR-8 | Define `coordinate` as the skill for managing active work across people, agents, tasks, and dependencies. | MUST | Triggers on “coordinate”, “manage this work”, “team lead”, “lead this”, “assign”, “delegate”, “track blockers”, “status”, “handoff”, and multi-agent or multi-workstream requests. Maintains an execution view with goals, owners, dependencies, current status, blockers, and next actions. | TBD | -| FR-9 | Define `remember` as the skill for durable project memory. | MUST | Triggers when the user asks to remember, save context, record a decision, update memory, or preserve a project fact. When the user explicitly asks to remember something, the memory write is auto-approved and should proceed without asking again. Writes only durable facts, decisions, and observations to `.agents/memory/` according to project conventions. Avoids storing transient task chatter or unverifiable assumptions as fact. | TBD | +| FR-1 | Define `ask-questions` as the skill for question generation and clarification. | MUST | Triggers on “ask-questions”, “what should I ask”, “what are the right questions”, “what are we missing”, “clarify this”, and ambiguous requests where progress depends on missing context. Produces a minimal, prioritized set of useful questions, assumptions, and context gaps. | TBD | +| FR-2 | Define `explain-topic` as the skill for clarification and teaching. | MUST | Triggers on “explain-topic”, “what is”, “why”, “how does”, “walk me through”, and code/concept explanation requests. Produces clear explanations matched to the user's context and desired depth. | TBD | +| FR-3 | Define `reason-problem` as the skill for working through ambiguous problems. | MUST | Triggers on “reason through”, “think through”, “brainstorm”, “tackle this problem”, “help me frame this”, “let’s work through this”, and messy problem statements where the desired output is not yet clear. Clarifies terms, assumptions, constraints, possible explanations, and candidate directions without forcing a premature decision or plan. | TBD | +| FR-4 | Define `classify-content` as the skill for organizing material into meaningful groups. | MUST | Triggers on “classify-content”, “categorize”, “group”, “cluster”, “sort”, “taxonomy”, “organize these”, and requests to group items by similarity, difference, category, priority, dependency, abstraction level, or other explicit criteria. Produces labeled groups, grouping criteria, notable edge cases, and items that do not clearly fit. | TBD | +| FR-5 | Define `plan-work` as the skill for sequencing work before execution. | MUST | Triggers on “plan-work”, “break this down”, “roadmap”, “approach”, “milestones”, and “how should we proceed”. Produces scoped steps, risks, assumptions, and verification strategy when relevant. | TBD | +| FR-6 | Define `explore-context` as the skill for local investigation and repository research. | MUST | Triggers on “explore-context”, “investigate”, “find where”, “understand this repo”, “trace”, and local-context research requests. Covers local code, project docs, attached artifacts, and repository context only. Produces findings with file references, artifact references, or uncertainty clearly marked. | TBD | +| FR-7 | Define `decide-direction` as the skill for choosing among options. | MUST | Triggers on “decide-direction”, “choose”, “which option”, “tradeoffs”, “recommend”, and “should we”. States decision criteria, compares viable options, recommends one, and identifies reversibility or risk. | TBD | +| FR-8 | Define `coordinate-work` as the skill for managing active work across people, agents, tasks, and dependencies. | MUST | Triggers on “coordinate-work”, “manage this work”, “team lead”, “lead this”, “assign”, “delegate”, “track blockers”, “status”, “handoff”, and multi-agent or multi-workstream requests. Maintains an execution view with goals, owners, dependencies, current status, blockers, and next actions. | TBD | +| FR-9 | Define `remember-context` as the skill for durable project memory. | MUST | Triggers when the user asks to remember, save context, record a decision, update memory, or preserve a project fact. When the user explicitly asks to remember something, the memory write is auto-approved and should proceed without asking again. Writes only durable facts, decisions, and observations to `.agents/memory/` according to project conventions. Avoids storing transient task chatter or unverifiable assumptions as fact. | TBD | | FR-10 | Document standalone runtime boundaries. | MUST | Each skill defines its own purpose, trigger phrases, non-trigger cases, expected behavior, and output shape without requiring, naming, or delegating to another skill at runtime. | TBD | | FR-11 | Add behavior evals. | SHOULD | Each skill has at least 7 representative prompts: 3 true-positive prompts, 2 false-positive prompts, and 2 non-trigger prompts. | TBD | @@ -101,14 +101,14 @@ Without these skills, the agent has to infer these broad behaviors from generic | NFR ID | Category | Target Specification | | --- | --- | --- | | NFR-1 | Maintainability | Each skill has one clear workflow and avoids becoming a dumping ground for generic agent behavior. | -| NFR-2 | Portability | Skills work across repositories and do not assume this repository layout except where `remember` explicitly uses `.agents/memory/`. Runtime behavior must not depend on any other skill being installed. | +| NFR-2 | Portability | Skills work across repositories and do not assume this repository layout except where `remember-context` explicitly uses `.agents/memory/`. Runtime behavior must not depend on any other skill being installed. | | NFR-3 | Token Efficiency | Main `SKILL.md` files stay concise; long examples or eval details move to references only when they reduce ambiguity. | -| NFR-4 | Question Quality | `ask` must prefer the fewest high-leverage questions over exhaustive questionnaires. | -| NFR-5 | Reasoning Quality | `reason` must expose assumptions, uncertainty, and competing interpretations instead of presenting guesses as settled conclusions. | -| NFR-6 | Classification Quality | `classify` must state the grouping criteria and preserve ambiguous or multi-fit items instead of forcing every item into a clean bucket. | -| NFR-7 | Source Discipline | `explore` must cite local files, project docs, or attached artifacts and distinguish verified repository facts from inference. It must not perform web search or browsing. | -| NFR-8 | Memory Hygiene | `remember` must preserve useful context without duplicating docs or storing sensitive/transient information. | -| NFR-9 | Coordination Clarity | `coordinate` must keep status, owners, blockers, and next actions explicit enough that another agent or human can continue the work. | +| NFR-4 | Question Quality | `ask-questions` must prefer the fewest high-leverage questions over exhaustive questionnaires. | +| NFR-5 | Reasoning Quality | `reason-problem` must expose assumptions, uncertainty, and competing interpretations instead of presenting guesses as settled conclusions. | +| NFR-6 | Classification Quality | `classify-content` must state the grouping criteria and preserve ambiguous or multi-fit items instead of forcing every item into a clean bucket. | +| NFR-7 | Source Discipline | `explore-context` must cite local files, project docs, or attached artifacts and distinguish verified repository facts from inference. It must not perform web search or browsing. | +| NFR-8 | Memory Hygiene | `remember-context` must preserve useful context without duplicating docs or storing sensitive/transient information. | +| NFR-9 | Coordination Clarity | `coordinate-work` must keep status, owners, blockers, and next actions explicit enough that another agent or human can continue the work. | --- @@ -125,15 +125,15 @@ Without these skills, the agent has to infer these broad behaviors from generic ## User Journeys / Key Flows -1. A user asks, “What should we ask before committing to this approach?” The agent uses `ask` to produce a short set of high-leverage questions, assumptions, and missing context. -2. A user asks, “Explain how this auth flow works.” The agent uses `explain`, reads the relevant code only if needed, and returns a clear explanation with file references when applicable. -3. A user asks, “Let’s think through why this workflow feels fragile.” The agent uses `reason` to clarify the problem, surface assumptions, generate hypotheses, and identify what would make the situation clearer. -4. A user asks, “Classify these feature requests by underlying user need.” The agent uses `classify` to define grouping criteria, label groups, place items, and flag ambiguous cases. -5. A user asks, “Let’s plan the migration.” The agent uses `plan` to identify phases, risks, verification, assumptions, and next actions. -6. A user asks, “Explore whether we already support this.” The agent uses `explore`, searches local files, project docs, and attached artifacts, then reports findings and gaps. -7. A user asks, “Should we build this as a plugin or a skill?” The agent uses `decide`, compares options against explicit criteria, and recommends one. -8. A user asks, “Lead this migration across frontend, backend, and tests.” The agent uses `coordinate` to track workstreams, owners, dependencies, blockers, status, and handoffs. -9. A user asks, “Remember that we chose skills over plugins for this.” The agent uses `remember`, records the durable decision in the appropriate memory file, and keeps the note concise. +1. A user asks, “What should we ask before committing to this approach?” The agent uses `ask-questions` to produce a short set of high-leverage questions, assumptions, and missing context. +2. A user asks, “Explain how this auth flow works.” The agent uses `explain-topic`, reads the relevant code only if needed, and returns a clear explanation with file references when applicable. +3. A user asks, “Let’s think through why this workflow feels fragile.” The agent uses `reason-problem` to clarify the problem, surface assumptions, generate hypotheses, and identify what would make the situation clearer. +4. A user asks, “Classify these feature requests by underlying user need.” The agent uses `classify-content` to define grouping criteria, label groups, place items, and flag ambiguous cases. +5. A user asks, “Let’s plan the migration.” The agent uses `plan-work` to identify phases, risks, verification, assumptions, and next actions. +6. A user asks, “Explore whether we already support this.” The agent uses `explore-context`, searches local files, project docs, and attached artifacts, then reports findings and gaps. +7. A user asks, “Should we build this as a plugin or a skill?” The agent uses `decide-direction`, compares options against explicit criteria, and recommends one. +8. A user asks, “Lead this migration across frontend, backend, and tests.” The agent uses `coordinate-work` to track workstreams, owners, dependencies, blockers, status, and handoffs. +9. A user asks, “Remember that we chose skills over plugins for this.” The agent uses `remember-context`, records the durable decision in the appropriate memory file, and keeps the note concise. --- @@ -142,14 +142,14 @@ Without these skills, the agent has to infer these broad behaviors from generic | Risk ID | Assumption / Risk Description | Impact | Mitigation Strategy | Status | | --- | --- | --- | --- | --- | | R-1 | General skills could become too broad and behave inconsistently. | HIGH | Define clear trigger phrases, non-trigger cases, and output expectations in every skill. | OPEN | -| R-2 | `explore` could be mistaken for web research. | MEDIUM | Define `explore` as local and artifact investigation only; external/current research remains out of scope for this skill. | OPEN | -| R-3 | `remember` could accumulate low-value notes. | MEDIUM | Require durability criteria before writing memory. | OPEN | -| R-4 | `decide` recommendations may hide subjective criteria. | MEDIUM | Require explicit criteria, assumptions, and reversibility notes. | OPEN | +| R-2 | `explore-context` could be mistaken for web research. | MEDIUM | Define `explore-context` as local and artifact investigation only; external/current research remains out of scope for this skill. | OPEN | +| R-3 | `remember-context` could accumulate low-value notes. | MEDIUM | Require durability criteria before writing memory. | OPEN | +| R-4 | `decide-direction` recommendations may hide subjective criteria. | MEDIUM | Require explicit criteria, assumptions, and reversibility notes. | OPEN | | R-5 | Eval coverage may be too small to catch trigger conflicts. | MEDIUM | Include false-positive and non-trigger prompts, not only happy-path prompts. | OPEN | -| R-6 | `coordinate` could overlap with `plan`. | MEDIUM | Define `plan` as pre-execution sequencing and `coordinate` as active coordination across workstreams, owners, blockers, and handoffs. | OPEN | -| R-7 | `reason` could become vague brainstorming without useful output. | MEDIUM | Require a clear problem framing, assumptions, hypotheses or options, and suggested next clarity step. | OPEN | -| R-8 | `ask` could become an endless questionnaire. | MEDIUM | Require prioritized questions and a bias toward the smallest question set that changes the next action. | OPEN | -| R-9 | `classify` could force false precision. | MEDIUM | Require explicit grouping criteria, ambiguous cases, and optional multi-label classifications when needed. | OPEN | +| R-6 | `coordinate-work` could overlap with `plan-work`. | MEDIUM | Define `plan-work` as pre-execution sequencing and `coordinate-work` as active coordination across workstreams, owners, blockers, and handoffs. | OPEN | +| R-7 | `reason-problem` could become vague brainstorming without useful output. | MEDIUM | Require a clear problem framing, assumptions, hypotheses or options, and suggested next clarity step. | OPEN | +| R-8 | `ask-questions` could become an endless questionnaire. | MEDIUM | Require prioritized questions and a bias toward the smallest question set that changes the next action. | OPEN | +| R-9 | `classify-content` could force false precision. | MEDIUM | Require explicit grouping criteria, ambiguous cases, and optional multi-label classifications when needed. | OPEN | --- @@ -158,7 +158,7 @@ Without these skills, the agent has to infer these broad behaviors from generic | Dependency ID | Item | Impacted Requirements | Validation Owner | | --- | --- | --- | --- | | D-1 | Existing `.agents/memory/` conventions | FR-9, NFR-8 | Oleg Shulyakov | -| D-2 | `creator-skill` validation workflow for development-time checks only | FR-11 | TBD | +| D-2 | `create-skill` validation workflow for development-time checks only | FR-11 | TBD | --- @@ -166,13 +166,13 @@ Without these skills, the agent has to infer these broad behaviors from generic | Question ID | Question | Answer / Decision | Owner | Resolution Date | | --- | --- | --- | --- | --- | -| Q-1 | Should `explore` include web research by default, or only when the user asks or current information matters? | Decided: no. `explore` covers local code, project docs, attached artifacts, and repository context only. Web search and browsing are out of scope. | Oleg Shulyakov | 2026-05-21 | -| Q-2 | Should `remember` ask before writing memory, or write automatically when the user explicitly says “remember”? | Decided: explicit user requests to remember are auto-approved. The skill should write without asking again, while still filtering for durable project value and avoiding sensitive, transient, or unverifiable notes. | Oleg Shulyakov | 2026-05-21 | -| Q-3 | Should these skills use neutral names (`plan`) or prefix-first names (`planner-general`, `research-general`)? | Decided: keep simple names because they are standalone cognitive modes. | Oleg Shulyakov | 2026-05-21 | -| Q-4 | Should `plan` create task files, or only produce conversational plans unless paired with another writing skill? | Decided: conversational by default; durable task files require explicit user request or substantial work. | Oleg Shulyakov | 2026-05-21 | -| Q-5 | Should the coordination skill be named `coordinate`, `lead`, or `manage`? | Decided: `coordinate`, because it is plain, action-oriented, and covers team-leading, delegation, status, and multi-agent coordination without implying people-management authority. | Oleg Shulyakov | 2026-05-21 | -| Q-6 | Should the ambiguous-problem skill be named `reason`, `think`, or `brainstorm`? | Decided: `reason`, because it covers brainstorming, framing, assumptions, and argument-testing without being limited to idea generation. | Oleg Shulyakov | 2026-05-21 | -| Q-7 | Should question generation be its own skill or part of `reason`? | Decided: keep `ask` separate because identifying the right questions is a distinct output and often useful before any reasoning path is chosen. | Oleg Shulyakov | 2026-05-21 | -| Q-8 | Should grouping be named `classify`, `sort`, or `categorize`? | Decided: `classify`, because it covers category assignment, similarity/difference grouping, taxonomies, and edge cases more precisely than `sort`. | Oleg Shulyakov | 2026-05-21 | +| Q-1 | Should `explore-context` include web research by default, or only when the user asks or current information matters? | Decided: no. `explore-context` covers local code, project docs, attached artifacts, and repository context only. Web search and browsing are out of scope. | Oleg Shulyakov | 2026-05-21 | +| Q-2 | Should `remember-context` ask before writing memory, or write automatically when the user explicitly says “remember-context”? | Decided: explicit user requests to remember-context are auto-approved. The skill should write without asking again, while still filtering for durable project value and avoiding sensitive, transient, or unverifiable notes. | Oleg Shulyakov | 2026-05-21 | +| Q-3 | Should these skills use neutral names (`plan-work`) or verb-first names (`plan-general`, `research-context`)? | Decided: keep simple names because they are standalone cognitive modes. | Oleg Shulyakov | 2026-05-21 | +| Q-4 | Should `plan-work` create task files, or only produce conversational plans unless paired with another writing skill? | Decided: conversational by default; durable task files require explicit user request or substantial work. | Oleg Shulyakov | 2026-05-21 | +| Q-5 | Should the coordination skill be named `coordinate-work`, `lead`, or `manage`? | Decided: `coordinate-work`, because it is plain, action-oriented, and covers team-leading, delegation, status, and multi-agent coordination without implying people-management authority. | Oleg Shulyakov | 2026-05-21 | +| Q-6 | Should the ambiguous-problem skill be named `reason-problem`, `think`, or `brainstorm`? | Decided: `reason-problem`, because it covers brainstorming, framing, assumptions, and argument-testing without being limited to idea generation. | Oleg Shulyakov | 2026-05-21 | +| Q-7 | Should question generation be its own skill or part of `reason-problem`? | Decided: keep `ask-questions` separate because identifying the right questions is a distinct output and often useful before any reasoning path is chosen. | Oleg Shulyakov | 2026-05-21 | +| Q-8 | Should grouping be named `classify-content`, `sort`, or `categorize`? | Decided: `classify-content`, because it covers category assignment, similarity/difference grouping, taxonomies, and edge cases more precisely than `sort`. | Oleg Shulyakov | 2026-05-21 | --- diff --git a/docs/2026-05-20-general-agent-skills/SPEC.md b/docs/2026-05-20-general-agent-skills/SPEC.md index e97f9e4..327eac3 100644 --- a/docs/2026-05-20-general-agent-skills/SPEC.md +++ b/docs/2026-05-20-general-agent-skills/SPEC.md @@ -16,7 +16,7 @@ ### 1.1 Purpose -This spec defines the implementation contract for nine standalone, general-purpose agent skills: `ask`, `explain`, `reason`, `classify`, `plan`, `explore`, `decide`, `coordinate`, and `remember`. +This spec defines the implementation contract for nine standalone, general-purpose agent skills: `ask-questions`, `explain-topic`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, and `remember-context`. The goal is to make common collaboration modes predictable at runtime without requiring any skill to depend on another installed skill. @@ -24,7 +24,7 @@ The goal is to make common collaboration modes predictable at runtime without re The existing skill library covers specialized creation, coding, review, documentation, and operations workflows. It has a remaining gap for project-agnostic thinking modes that recur across repositories and tasks. -Without explicit skills for these modes, the agent must infer broad behavior from generic instructions each time. That creates inconsistent trigger behavior, unclear output shapes, and accidental overlap between similar modes such as `plan` and `coordinate`, or `ask` and `reason`. +Without explicit skills for these modes, the agent must infer broad behavior from generic instructions each time. That creates inconsistent trigger behavior, unclear output shapes, and accidental overlap between similar modes such as `plan-work` and `coordinate-work`, or `ask-questions` and `reason-problem`. This work creates a small general layer with clear trigger boundaries, exclusions, expected behavior, and eval prompts for each skill. @@ -47,14 +47,14 @@ Success means a user can install any one of the nine skills independently and ge | Goal | Success Metric | Target | | --- | --- | --- | -| Minimal general skill set | Nine skills exist with approved names | `ask`, `explain`, `reason`, `classify`, `plan`, `explore`, `decide`, `coordinate`, `remember` | +| Minimal general skill set | Nine skills exist with approved names | `ask-questions`, `explain-topic`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, `remember-context` | | Standalone runtime behavior | No skill requires another skill to be installed, named, or delegated to | 100% of skills | | Predictable triggers | Each skill documents triggers, exclusions, expected behavior, and eval prompts | 8-10 eval prompts where possible, never fewer than 7 | | Lightweight packaging | Main `SKILL.md` files remain concise | Under 500 lines each | ### 1.6 Non-Goals -This work does not add live integrations with Jira, Linear, Confluence, GitHub Issues, or external memory stores. It does not add web search behavior to `explore`, automatic memory writes without durable value, a shared trigger-overlap eval harness, or replacements for project-level `AGENTS.md` instructions. +This work does not add live integrations with Jira, Linear, Confluence, GitHub Issues, or external memory stores. It does not add web search behavior to `explore-context`, automatic memory writes without durable value, a shared trigger-overlap eval harness, or replacements for project-level `AGENTS.md` instructions. --- @@ -82,7 +82,7 @@ Each skill folder shall include: └── evals.json ``` -Evaluation run results, when generated, shall be stored under `evals/iterations/iteration-N/` according to the `creator-skill` workflow. A `references/` folder may be added only when it contains useful supporting files, such as examples, detailed procedures, or compatibility notes that would make `SKILL.md` too long or less readable. Do not create placeholder `references/` folders. +Evaluation run results, when generated, shall be stored under `evals/iterations/iteration-N/` according to the `create-skill` workflow. A `references/` folder may be added only when it contains useful supporting files, such as examples, detailed procedures, or compatibility notes that would make `SKILL.md` too long or less readable. Do not create placeholder `references/` folders. ### 2.3 Skill Metadata Contract @@ -106,36 +106,36 @@ Each skill body shall define purpose, scope, trigger cases, non-trigger cases, w ### 2.4 Per-Skill Behavior Requirements -#### FR-001: `ask` +#### FR-001: `ask-questions` **Priority:** Must-have -**Description:** The system shall use `ask` for question generation, clarification, missing-context discovery, and assumption surfacing. +**Description:** The system shall use `ask-questions` for question generation, clarification, missing-context discovery, and assumption surfacing. **Acceptance criteria:** -- [ ] Triggers on "ask", "what should I ask", "right questions", "what are we missing", "clarify this", and ambiguous requests blocked by missing context. +- [ ] Triggers on "ask-questions", "what should I ask", "right questions", "what are we missing", "clarify this", and ambiguous requests blocked by missing context. - [ ] Produces a minimal prioritized set of high-leverage questions. - [ ] States assumptions and context gaps when useful. - [ ] Avoids exhaustive questionnaires unless explicitly requested. - [ ] Does not make decisions, plans, or implementation changes as its primary output. -#### FR-002: `explain` +#### FR-002: `explain-topic` **Priority:** Must-have -**Description:** The system shall use `explain` for teaching, clarification, walkthroughs, concepts, code behavior, architecture, tradeoffs, and decisions. +**Description:** The system shall use `explain-topic` for teaching, clarification, walkthroughs, concepts, code behavior, architecture, tradeoffs, and decisions. **Acceptance criteria:** -- [ ] Triggers on "explain", "what is", "why", "how does", "walk me through", and direct explanation requests. +- [ ] Triggers on "explain-topic", "what is", "why", "how does", "walk me through", and direct explanation requests. - [ ] Matches depth to the user's question and available context. - [ ] For code explanations, inspects relevant local files before describing repository behavior. - [ ] Marks uncertainty when evidence is incomplete. - [ ] Does not implement, review, or plan unless the user asks for that additional work. -#### FR-003: `reason` +#### FR-003: `reason-problem` **Priority:** Must-have -**Description:** The system shall use `reason` to work through ambiguous problems before a firm output shape, decision, or plan is warranted. +**Description:** The system shall use `reason-problem` to work through ambiguous problems before a firm output shape, decision, or plan is warranted. **Acceptance criteria:** @@ -145,80 +145,80 @@ Each skill body shall define purpose, scope, trigger cases, non-trigger cases, w - [ ] Ends with a clearer framing or next clarity step. - [ ] Distinguishes facts, assumptions, and opinions. -#### FR-004: `classify` +#### FR-004: `classify-content` **Priority:** Must-have -**Description:** The system shall use `classify` to organize material into meaningful groups. +**Description:** The system shall use `classify-content` to organize material into meaningful groups. **Acceptance criteria:** -- [ ] Triggers on "classify", "categorize", "group", "cluster", "sort", "taxonomy", "organize these", and requests to group by explicit criteria. +- [ ] Triggers on "classify-content", "categorize", "group", "cluster", "sort", "taxonomy", "organize these", and requests to group by explicit criteria. - [ ] States grouping criteria before or alongside the classification. - [ ] Labels groups clearly and places items into them. - [ ] Preserves ambiguous, multi-fit, or unclassified items instead of forcing false precision. - [ ] Supports grouping by similarity, difference, category, priority, dependency, abstraction level, or user-provided criteria. -#### FR-005: `plan` +#### FR-005: `plan-work` **Priority:** Must-have -**Description:** The system shall use `plan` to sequence work before execution. +**Description:** The system shall use `plan-work` to sequence work before execution. **Acceptance criteria:** -- [ ] Triggers on "plan", "break this down", "roadmap", "approach", "milestones", and "how should we proceed". +- [ ] Triggers on "plan-work", "break this down", "roadmap", "approach", "milestones", and "how should we proceed". - [ ] Produces scoped steps, milestones, dependencies, assumptions, risks, and verification strategy when relevant. - [ ] Defaults to conversational planning unless durable files are explicitly requested or the task clearly needs them. -- [ ] Does not coordinate live owners, blockers, handoffs, or active workstreams as its primary behavior. +- [ ] Does not coordinate-work live owners, blockers, handoffs, or active workstreams as its primary behavior. - [ ] Identifies when more context is required before a reliable plan can be made. -#### FR-006: `explore` +#### FR-006: `explore-context` **Priority:** Must-have -**Description:** The system shall use `explore` for local repository, local document, and attached-artifact investigation. +**Description:** The system shall use `explore-context` for local repository, local document, and attached-artifact investigation. **Acceptance criteria:** -- [ ] Triggers on "explore", "investigate", "find where", "understand this repo", "trace", and local-context research requests. +- [ ] Triggers on "explore-context", "investigate", "find where", "understand this repo", "trace", and local-context research requests. - [ ] Searches local files, project docs, attached artifacts, and repository context only. - [ ] Uses file references, artifact references, and command evidence for findings. - [ ] Distinguishes verified facts from inference. - [ ] Does not perform web search or browsing as part of this skill. -#### FR-007: `decide` +#### FR-007: `decide-direction` **Priority:** Must-have -**Description:** The system shall use `decide` to compare options and recommend a direction. +**Description:** The system shall use `decide-direction` to compare options and recommend a direction. **Acceptance criteria:** -- [ ] Triggers on "decide", "choose", "which option", "tradeoffs", "recommend", and "should we". +- [ ] Triggers on "decide-direction", "choose", "which option", "tradeoffs", "recommend", and "should we". - [ ] States decision criteria before or alongside the comparison. - [ ] Compares viable options against the criteria. - [ ] Recommends one option when evidence supports a recommendation. - [ ] Notes assumptions, risks, tradeoffs, and reversibility. -#### FR-008: `coordinate` +#### FR-008: `coordinate-work` **Priority:** Must-have -**Description:** The system shall use `coordinate` to manage active work across people, agents, tasks, dependencies, blockers, and handoffs. +**Description:** The system shall use `coordinate-work` to manage active work across people, agents, tasks, dependencies, blockers, and handoffs. **Acceptance criteria:** -- [ ] Triggers on "coordinate", "manage this work", "team lead", "lead this", "assign", "delegate", "track blockers", "status", "handoff", and multi-agent or multi-workstream requests. +- [ ] Triggers on "coordinate-work", "manage this work", "team lead", "lead this", "assign", "delegate", "track blockers", "status", "handoff", and multi-agent or multi-workstream requests. - [ ] Maintains an execution view with goals, owners, dependencies, current status, blockers, and next actions. - [ ] Separates active coordination from pre-execution planning. - [ ] Makes handoff state clear enough for another human or agent to continue. - [ ] Does not silently assign real people to work without user-provided ownership or clear assumptions. -#### FR-009: `remember` +#### FR-009: `remember-context` **Priority:** Must-have -**Description:** The system shall use `remember` to preserve durable project facts, decisions, and useful observations in `.agents/memory/`. +**Description:** The system shall use `remember-context` to preserve durable project facts, decisions, and useful observations in `.agents/memory/`. **Acceptance criteria:** - [ ] Triggers when the user asks to remember, save context, record a decision, update memory, or preserve a project fact. -- [ ] Treats explicit user requests to remember as approval to write memory without asking again. +- [ ] Treats explicit user requests to remember-context as approval to write memory without asking again. - [ ] Writes only durable facts, decisions, and observations with project value. - [ ] Avoids storing transient task chatter, sensitive information, or unverifiable assumptions as fact. - [ ] Follows existing `.agents/memory/MEMORY.md` and dated memory file conventions. @@ -241,7 +241,7 @@ Each skill body shall define purpose, scope, trigger cases, non-trigger cases, w **Acceptance criteria:** -- [ ] Each skill has `evals/evals.json` generated through `.agents/skills/creator-skill/`. +- [ ] Each skill has `evals/evals.json` generated through `.agents/skills/create-skill/`. - [ ] Each skill has 8-10 realistic eval prompts where possible, and never fewer than the PRD minimum of 7. - [ ] Each eval set includes at least 3 true-positive prompts. - [ ] Each eval set includes at least 2 false-positive prompts where nearby language should route elsewhere or not trigger. @@ -250,13 +250,13 @@ Each skill body shall define purpose, scope, trigger cases, non-trigger cases, w ### 2.5 Business Rules -**BR-001:** Skill names are fixed as `ask`, `explain`, `reason`, `classify`, `plan`, `explore`, `decide`, `coordinate`, and `remember`. +**BR-001:** Skill names are fixed as `ask-questions`, `explain-topic`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, and `remember-context`. **BR-002:** Runtime behavior must be standalone. Development-time validation may use existing creator or packaging workflows, but installed skill behavior must not depend on them. -**BR-003:** `explore` is local-only. Web search, browsing, and current-information research are out of scope. +**BR-003:** `explore-context` is local-only. Web search, browsing, and current-information research are out of scope. -**BR-004:** `remember` may write memory automatically only when the user explicitly asks to remember or preserve something. +**BR-004:** `remember-context` may write memory automatically only when the user explicitly asks to remember or preserve something. **BR-005:** Durable task documentation belongs under `docs/`; durable memory facts and small implementation notes belong under `.agents/memory/`. @@ -267,12 +267,12 @@ Each skill body shall define purpose, scope, trigger cases, non-trigger cases, w | Category | Requirement | Target | Priority | | --- | --- | --- | --- | | Maintainability | Each skill has one clear workflow and avoids becoming a generic behavior dump | Reviewer can summarize each skill in one sentence | High | -| Portability | Skills work across repositories without assuming this repo layout, except `remember` memory conventions | No hard dependency on project-specific files outside documented exceptions | High | +| Portability | Skills work across repositories without assuming this repo layout, except `remember-context` memory conventions | No hard dependency on project-specific files outside documented exceptions | High | | Token efficiency | Main skill files stay concise | Under 500 lines per `SKILL.md` | High | | Trigger accuracy | Trigger and exclusion rules are explicit | Evals include positive, false-positive, and non-trigger prompts | High | -| Source discipline | `explore` cites local evidence and marks inference | Findings include file/artifact references when available | High | -| Memory hygiene | `remember` stores only durable value | No transient chatter or sensitive data in memory notes | High | -| Coordination clarity | `coordinate` preserves execution state | Goals, owners, status, blockers, dependencies, and next actions are explicit | Medium | +| Source discipline | `explore-context` cites local evidence and marks inference | Findings include file/artifact references when available | High | +| Memory hygiene | `remember-context` stores only durable value | No transient chatter or sensitive data in memory notes | High | +| Coordination clarity | `coordinate-work` preserves execution state | Goals, owners, status, blockers, dependencies, and next actions are explicit | Medium | --- @@ -297,20 +297,20 @@ flowchart TD | Component | Responsibility | | --- | --- | | `.agents/skills//SKILL.md` | Runtime instructions, metadata, trigger guidance, exclusions, workflow, and output expectations | -| `.agents/skills//evals/evals.json` | Representative trigger and non-trigger prompts generated through `creator-skill` | +| `.agents/skills//evals/evals.json` | Representative trigger and non-trigger prompts generated through `create-skill` | | `.agents/skills//evals/iterations/iteration-N/` | Reproducible eval run outputs, grading, and benchmark artifacts when generated | -| `.agents/memory/` | Target memory location for `remember` behavior | -| `.agents/skills/creator-skill/` | Development-time eval generation, validation, and packaging support | +| `.agents/memory/` | Target memory location for `remember-context` behavior | +| `.agents/skills/create-skill/` | Development-time eval generation, validation, and packaging support | ### 4.3 Key Design Decisions **Decision: Use simple cognitive-mode names.** Chosen names are short and direct because the PRD resolved naming in favor of standalone cognitive modes. The tradeoff is that trigger boundaries must be especially explicit to avoid overlap. -**Decision: Keep `explore` local-only.** +**Decision: Keep `explore-context` local-only.** This prevents accidental current-information research and keeps the skill portable across disconnected or restricted environments. The tradeoff is that users must invoke another workflow for web research. -**Decision: Treat explicit remember requests as approval.** +**Decision: Treat explicit remember-context requests as approval.** This removes a redundant confirmation step when the user has already asked to remember something. The tradeoff is that the skill must filter carefully for durability and sensitivity before writing. **Decision: Store eval prompts per skill.** @@ -322,7 +322,7 @@ Per-skill eval files keep each installable unit self-contained. A shared overlap No database or structured runtime data model is added. -The only persistent output introduced by skill behavior is `remember` writing Markdown entries under `.agents/memory/` according to existing conventions. +The only persistent output introduced by skill behavior is `remember-context` writing Markdown entries under `.agents/memory/` according to existing conventions. Memory entries shall use one of these categories when applicable: facts, preferences, decisions, or observations. Decision entries should include context, decision, and revisit conditions when those details are available. @@ -330,25 +330,25 @@ Memory entries shall use one of these categories when applicable: facts, prefere ## 6. Security, Privacy, and Safety -Skills shall not request, store, or expose secrets. `remember` shall avoid writing credentials, tokens, private personal information, transient task chatter, or unverifiable assumptions as fact. +Skills shall not request, store, or expose secrets. `remember-context` shall avoid writing credentials, tokens, private personal information, transient task chatter, or unverifiable assumptions as fact. -`explore` shall not use web browsing, web search, external services, or live integrations. Its findings shall be based on local repository files, local docs, attached artifacts, or clearly marked inference. +`explore-context` shall not use web browsing, web search, external services, or live integrations. Its findings shall be based on local repository files, local docs, attached artifacts, or clearly marked inference. -Skills shall avoid presenting subjective recommendations as facts. `reason` and `decide` shall distinguish assumptions, evidence, opinion, and uncertainty. +Skills shall avoid presenting subjective recommendations as facts. `reason-problem` and `decide-direction` shall distinguish assumptions, evidence, opinion, and uncertainty. --- ## 7. Error Paths and Edge Cases -If a user request matches multiple skills, the selected skill shall explain the dominant intent through its output shape, not through a long routing discussion. For example, "Should we plan this migration or split it?" should favor `decide` if the user needs a choice, and `plan` if the choice is already settled. +If a user request matches multiple skills, the selected skill shall explain the dominant intent through its output shape, not through a long routing discussion. For example, "Should we plan this migration or split it?" should favor `decide-direction` if the user needs a choice, and `plan-work` if the choice is already settled. -If required local evidence is missing, `explore` and `explain` shall report what was inspected, what could not be verified, and the best-supported inference. +If required local evidence is missing, `explore-context` and `explain-topic` shall report what was inspected, what could not be verified, and the best-supported inference. -If `remember` receives content that is explicit but not durable, sensitive, or unverifiable, it shall decline the memory write briefly and explain the reason. +If `remember-context` receives content that is explicit but not durable, sensitive, or unverifiable, it shall decline the memory write briefly and explain the reason. -If `coordinate` lacks owners, it shall use unassigned workstreams or assumed role labels instead of inventing real ownership. +If `coordinate-work` lacks owners, it shall use unassigned workstreams or assumed role labels instead of inventing real ownership. -If `classify` receives items that do not fit a single category, it shall use an ambiguous, multi-label, or needs-review grouping rather than forcing a clean bucket. +If `classify-content` receives items that do not fit a single category, it shall use an ambiguous, multi-label, or needs-review grouping rather than forcing a clean bucket. --- @@ -360,7 +360,7 @@ Review every `SKILL.md` for frontmatter completeness, trigger specificity, exclu ### 8.2 Trigger Eval Review -For each skill, generate and review `evals/evals.json` through `.agents/skills/creator-skill/`. Use 8-10 realistic prompts where possible, and never fewer than the PRD minimum of 7: +For each skill, generate and review `evals/evals.json` through `.agents/skills/create-skill/`. Use 8-10 realistic prompts where possible, and never fewer than the PRD minimum of 7: ```text 3 true-positive prompts @@ -376,12 +376,12 @@ Boundary prompts shall specifically test likely overlaps: | Boundary | Expected Distinction | | --- | --- | -| `ask` vs `reason` | `ask` produces questions; `reason` develops framing and hypotheses | -| `reason` vs `decide` | `reason` clarifies ambiguity; `decide` recommends between options | -| `plan` vs `coordinate` | `plan` sequences future work; `coordinate` tracks active workstreams and handoffs | -| `explain` vs `explore` | `explain` teaches; `explore` investigates local evidence | -| `classify` vs `decide` | `classify` groups material; `decide` chooses a direction | -| `remember` vs docs writing | `remember` captures durable memory; docs writing creates formal project artifacts | +| `ask-questions` vs `reason-problem` | `ask-questions` produces questions; `reason-problem` develops framing and hypotheses | +| `reason-problem` vs `decide-direction` | `reason-problem` clarifies ambiguity; `decide-direction` recommends between options | +| `plan-work` vs `coordinate-work` | `plan-work` sequences future work; `coordinate-work` tracks active workstreams and handoffs | +| `explain-topic` vs `explore-context` | `explain-topic` teaches; `explore-context` investigates local evidence | +| `classify-content` vs `decide-direction` | `classify-content` groups material; `decide-direction` chooses a direction | +| `remember-context` vs docs writing | `remember-context` captures durable memory; docs writing creates formal project artifacts | ### 8.4 Manual Acceptance @@ -393,13 +393,13 @@ Manual acceptance passes when a reviewer can invoke representative prompts and o ### Phase 1: Skill Boundaries -- [ ] Draft `SKILL.md` for `ask`, `reason`, `classify`, `plan`, `explore`, `decide`, `coordinate`, and `remember`. -- [x] Treat existing `explain` as complete for this work. +- [ ] Draft `SKILL.md` for `ask-questions`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, and `remember-context`. +- [x] Treat existing `explain-topic` as complete for this work. - [ ] Confirm each skill has clear trigger and non-trigger rules. ### Phase 2: Evals -- [ ] Generate evals through `.agents/skills/creator-skill/`. +- [ ] Generate evals through `.agents/skills/create-skill/`. - [ ] Add generated evals for each skill. - [ ] Include true-positive, false-positive, and non-trigger prompts. - [ ] Add boundary prompts for common overlaps. @@ -422,8 +422,8 @@ Manual acceptance passes when a reviewer can invoke representative prompts and o | Dependency | Needed By | | --- | --- | | Existing skill authoring conventions | All skill files | -| Existing `.agents/memory/` conventions | `remember` | -| `.agents/skills/creator-skill/` eval generation workflow | Evals and release readiness | +| Existing `.agents/memory/` conventions | `remember-context` | +| `.agents/skills/create-skill/` eval generation workflow | Evals and release readiness | --- @@ -444,8 +444,8 @@ Manual acceptance passes when a reviewer can invoke representative prompts and o | # | Question | Owner | Due | Status | | --- | --- | --- | --- | --- | -| 1 | Should eval prompts be plain Markdown or a machine-readable format? | Oleg Shulyakov | 2026-05-21 | Resolved: evals are generated by `.agents/skills/creator-skill/`. | -| 2 | Should `explain` be treated as already complete or revised to match the new general skill set style? | Oleg Shulyakov | 2026-05-21 | Resolved: mark `explain` as complete. | +| 1 | Should eval prompts be plain Markdown or a machine-readable format? | Oleg Shulyakov | 2026-05-21 | Resolved: evals are generated by `.agents/skills/create-skill/`. | +| 2 | Should `explain-topic` be treated as already complete or revised to match the new general skill set style? | Oleg Shulyakov | 2026-05-21 | Resolved: mark `explain-topic` as complete. | | 3 | Should every new skill use version `1.0.0`, or inherit a project-wide initial version convention? | Oleg Shulyakov | 2026-05-21 | Resolved: use `1.0.0` as the initial version. | --- @@ -456,5 +456,5 @@ Related documents: - [PRD.md](PRD.md) - `.agents/memory/MEMORY.md` -- `.agents/skills/creator-skill/SKILL.md` -- `.agents/skills/explain/SKILL.md` +- `.agents/skills/create-skill/SKILL.md` +- `.agents/skills/explain-topic/SKILL.md` diff --git a/docs/2026-05-20-general-agent-skills/user-stories/US-001-author-standalone-general-skills.md b/docs/2026-05-20-general-agent-skills/user-stories/US-001-author-standalone-general-skills.md index a1cabe7..0cda3a4 100644 --- a/docs/2026-05-20-general-agent-skills/user-stories/US-001-author-standalone-general-skills.md +++ b/docs/2026-05-20-general-agent-skills/user-stories/US-001-author-standalone-general-skills.md @@ -14,20 +14,20 @@ Source documents: - **Persona:** As a skill library maintainer, - **Action:** I want the missing general-purpose agent skills authored as standalone installable skill folders, - **Outcome:** so that users can invoke consistent collaboration modes without hidden runtime dependencies. -- **Epic Context:** Implements the approved General Agent Skills PRD/SPEC by creating `ask`, `reason`, `classify`, `plan`, `explore`, `decide`, `coordinate`, and `remember`. Existing `explain` is already complete and must not be rewritten unless validation reveals a spec violation. +- **Epic Context:** Implements the approved General Agent Skills PRD/SPEC by creating `ask-questions`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, and `remember-context`. Existing `explain-topic` is already complete and must not be rewritten unless validation reveals a spec violation. --- ## 🔍 2. Strict Constraints & Scope Boundaries - **In-Scope:** - - Create `.agents/skills//SKILL.md` for `ask`, `reason`, `classify`, `plan`, `explore`, `decide`, `coordinate`, and `remember`. + - Create `.agents/skills//SKILL.md` for `ask-questions`, `reason-problem`, `classify-content`, `plan-work`, `explore-context`, `decide-direction`, `coordinate-work`, and `remember-context`. - Use initial skill version `1.0.0`. - Include frontmatter fields required by local skill conventions. - Define each skill's purpose, trigger cases, non-trigger cases, workflow, output expectations, error paths, and verification guidance where relevant. - Keep every skill independently installable and runtime-standalone. - **Out-of-Scope (Do NOT implement):** - - Do not modify `explain` unless a direct mismatch with the approved SPEC is found and documented. + - Do not modify `explain-topic` unless a direct mismatch with the approved SPEC is found and documented. - Do not add live Jira, Linear, Confluence, GitHub Issues, web browsing, web search, or external memory integrations. - Do not create placeholder `references/`, `scripts/`, or `assets/` folders. - Do not make one skill delegate to another skill at runtime. @@ -74,18 +74,18 @@ Scenario: Avoid placeholder support folders *Note to Agent: You are restricted to modifying or analyzing the following components.* - **Primary Target Files:** - 1. `.agents/skills/ask/SKILL.md` -> New question-generation skill. - 2. `.agents/skills/reason/SKILL.md` -> New ambiguous-problem reasoning skill. - 3. `.agents/skills/classify/SKILL.md` -> New classification and grouping skill. - 4. `.agents/skills/plan/SKILL.md` -> New planning skill. - 5. `.agents/skills/explore/SKILL.md` -> New local investigation skill. - 6. `.agents/skills/decide/SKILL.md` -> New decision support skill. - 7. `.agents/skills/coordinate/SKILL.md` -> New coordination skill. - 8. `.agents/skills/remember/SKILL.md` -> New durable memory skill. + 1. `.agents/skills/ask-questions/SKILL.md` -> New question-generation skill. + 2. `.agents/skills/reason-problem/SKILL.md` -> New ambiguous-problem reasoning skill. + 3. `.agents/skills/classify-content/SKILL.md` -> New classification and grouping skill. + 4. `.agents/skills/plan-work/SKILL.md` -> New planning skill. + 5. `.agents/skills/explore-context/SKILL.md` -> New local investigation skill. + 6. `.agents/skills/decide-direction/SKILL.md` -> New decision support skill. + 7. `.agents/skills/coordinate-work/SKILL.md` -> New coordination skill. + 8. `.agents/skills/remember-context/SKILL.md` -> New durable memory skill. - **Shared Dependencies/Imports:** - - Follow `.agents/skills/creator-skill/references/authoring.md`. + - Follow `.agents/skills/create-skill/references/authoring.md`. - Use [SPEC.md](../SPEC.md) as the implementation contract. - - Treat `.agents/skills/explain/SKILL.md` as complete. + - Treat `.agents/skills/explain-topic/SKILL.md` as complete. --- @@ -93,7 +93,7 @@ Scenario: Avoid placeholder support folders *Note to Agent: Execute these steps sequentially. Verify state after each step.* -1. **Analyze & Validate:** Read [SPEC.md](../SPEC.md), `.agents/skills/creator-skill/SKILL.md`, and `.agents/skills/creator-skill/references/authoring.md`. +1. **Analyze & Validate:** Read [SPEC.md](../SPEC.md), `.agents/skills/create-skill/SKILL.md`, and `.agents/skills/create-skill/references/authoring.md`. 2. **Create Skill Folders:** Create only the eight missing skill directories and required files. 3. **Author Skill Instructions:** Write focused `SKILL.md` files with explicit trigger and non-trigger behavior. 4. **Check Runtime Boundaries:** Search new skill files for runtime dependency language that points to another skill. @@ -107,5 +107,5 @@ Scenario: Avoid placeholder support folders - [ ] **Compilation:** Not applicable; Markdown authoring only. - [ ] **Test Coverage:** New skill files are ready for eval generation in US-002. -- [ ] **No Regression:** Existing `.agents/skills/explain/SKILL.md` remains unchanged unless a documented spec mismatch required a fix. +- [ ] **No Regression:** Existing `.agents/skills/explain-topic/SKILL.md` remains unchanged unless a documented spec mismatch required a fix. - [ ] **Idempotency:** Re-running the work does not duplicate folders, sections, or placeholder resources. diff --git a/docs/2026-05-20-general-agent-skills/user-stories/US-002-generate-skill-evals.md b/docs/2026-05-20-general-agent-skills/user-stories/US-002-generate-skill-evals.md index 0211ab2..743831b 100644 --- a/docs/2026-05-20-general-agent-skills/user-stories/US-002-generate-skill-evals.md +++ b/docs/2026-05-20-general-agent-skills/user-stories/US-002-generate-skill-evals.md @@ -14,7 +14,7 @@ Source documents: - **Persona:** As a skill library maintainer, - **Action:** I want representative evals generated for each general skill, - **Outcome:** so that trigger behavior and near-miss boundaries can be reviewed before release. -- **Epic Context:** Implements FR-011 from the approved SPEC. Evals are generated through `.agents/skills/creator-skill/` and stored inside each skill folder. +- **Epic Context:** Implements FR-011 from the approved SPEC. Evals are generated through `.agents/skills/create-skill/` and stored inside each skill folder. --- @@ -31,7 +31,7 @@ Source documents: - Do not store evals in a shared docs folder. - Do not create eval iteration output folders unless eval runs are actually executed. - **Data Models & Schemas:** - - Use the eval schema expected by `.agents/skills/creator-skill/`. + - Use the eval schema expected by `.agents/skills/create-skill/`. - Store eval cases at `.agents/skills//evals/evals.json`. - Store run outputs only under `.agents/skills//evals/iterations/iteration-N/` if runs are performed. @@ -44,7 +44,7 @@ Source documents: ```gherkin Scenario: Generate evals for each skill Given the eight new general skills exist - When the agent generates evals through creator-skill conventions + When the agent generates evals through create-skill conventions Then each new skill has evals/evals.json And each eval file contains 8-10 realistic prompts where possible And no eval file contains fewer than 7 prompts @@ -69,16 +69,16 @@ Scenario: Preserve eval folder discipline *Note to Agent: You are restricted to modifying or analyzing the following components.* - **Primary Target Files:** - 1. `.agents/skills/ask/evals/evals.json` -> Trigger and output evals. - 2. `.agents/skills/reason/evals/evals.json` -> Trigger and output evals. - 3. `.agents/skills/classify/evals/evals.json` -> Trigger and output evals. - 4. `.agents/skills/plan/evals/evals.json` -> Trigger and output evals. - 5. `.agents/skills/explore/evals/evals.json` -> Trigger and output evals. - 6. `.agents/skills/decide/evals/evals.json` -> Trigger and output evals. - 7. `.agents/skills/coordinate/evals/evals.json` -> Trigger and output evals. - 8. `.agents/skills/remember/evals/evals.json` -> Trigger and output evals. + 1. `.agents/skills/ask-questions/evals/evals.json` -> Trigger and output evals. + 2. `.agents/skills/reason-problem/evals/evals.json` -> Trigger and output evals. + 3. `.agents/skills/classify-content/evals/evals.json` -> Trigger and output evals. + 4. `.agents/skills/plan-work/evals/evals.json` -> Trigger and output evals. + 5. `.agents/skills/explore-context/evals/evals.json` -> Trigger and output evals. + 6. `.agents/skills/decide-direction/evals/evals.json` -> Trigger and output evals. + 7. `.agents/skills/coordinate-work/evals/evals.json` -> Trigger and output evals. + 8. `.agents/skills/remember-context/evals/evals.json` -> Trigger and output evals. - **Shared Dependencies/Imports:** - - Follow `.agents/skills/creator-skill/references/evaluation.md`. + - Follow `.agents/skills/create-skill/references/evaluation.md`. - Use boundary distinctions from [SPEC.md](../SPEC.md). --- @@ -87,11 +87,11 @@ Scenario: Preserve eval folder discipline *Note to Agent: Execute these steps sequentially. Verify state after each step.* -1. **Analyze & Validate:** Read [SPEC.md](../SPEC.md) Section 8 and `.agents/skills/creator-skill/references/evaluation.md`. +1. **Analyze & Validate:** Read [SPEC.md](../SPEC.md) Section 8 and `.agents/skills/create-skill/references/evaluation.md`. 2. **Generate Eval Cases:** Create prompt-level evals for each new skill. 3. **Check Counts:** Verify each eval file meets the 8-10 target where possible and never drops below 7. 4. **Check Boundary Coverage:** Confirm likely overlaps are represented across the relevant eval files. -5. **Validate JSON:** Ensure every `evals.json` file is valid JSON and follows the local creator-skill expectations. +5. **Validate JSON:** Ensure every `evals.json` file is valid JSON and follows the local create-skill expectations. --- diff --git a/docs/2026-05-20-general-agent-skills/user-stories/US-003-validate-skills-and-update-index.md b/docs/2026-05-20-general-agent-skills/user-stories/US-003-validate-skills-and-update-index.md index 73f112e..45d16ea 100644 --- a/docs/2026-05-20-general-agent-skills/user-stories/US-003-validate-skills-and-update-index.md +++ b/docs/2026-05-20-general-agent-skills/user-stories/US-003-validate-skills-and-update-index.md @@ -21,7 +21,7 @@ Source documents: ## 🔍 2. Strict Constraints & Scope Boundaries - **In-Scope:** - - Run available creator-skill validation checks on each new skill. + - Run available create-skill validation checks on each new skill. - Review line counts, metadata, section style, and runtime standalone behavior. - Update `.agents/skills/README.md` if it indexes maintained skills. - Fix validation failures that are directly related to the new skills. @@ -43,7 +43,7 @@ Source documents: ```gherkin Scenario: Validate each new skill Given a new skill folder exists with SKILL.md and evals/evals.json - When creator-skill validation is run against the skill folder + When create-skill validation is run against the skill folder Then validation passes And any failures are fixed or documented with a clear reason @@ -71,8 +71,8 @@ Scenario: Prevent runtime coupling 2. `.agents/skills//evals/evals.json` -> Validation target. 3. `.agents/skills/README.md` -> Skill index, if present. - **Shared Dependencies/Imports:** - - Use `.agents/skills/creator-skill/scripts/quick_validate.py` when available. - - Follow `.agents/skills/creator-skill/references/authoring.md`. + - Use `.agents/skills/create-skill/scripts/quick_validate.py` when available. + - Follow `.agents/skills/create-skill/references/authoring.md`. --- @@ -80,7 +80,7 @@ Scenario: Prevent runtime coupling *Note to Agent: Execute these steps sequentially. Verify state after each step.* -1. **Analyze & Validate:** Inspect `.agents/skills/README.md` and creator-skill validation scripts. +1. **Analyze & Validate:** Inspect `.agents/skills/README.md` and create-skill validation scripts. 2. **Run Validation:** Run quick validation for each new skill directory. 3. **Fix Failures:** Apply focused fixes to new skill files and evals. 4. **Update Index:** Add new skills to the README only if the README indexes maintained skills.