From 3f0a2fc5fccc47272ed37e27fc34c18a80d9bc51 Mon Sep 17 00:00:00 2001 From: Oleg Shulyakov Date: Sun, 24 May 2026 22:54:59 +0300 Subject: [PATCH] feat(skills): update create-skill authoring rules - Restructured SKILL.md with simplified Workflow, Output, and Boundaries sections - Removed required bold principle sentence after each `##` heading - Updated authoring reference for flexible body shape, scan anchors, and STAR examples - Removed bold principle validation from `quick_validate.py` --- .agents/skills/create-skill/SKILL.md | 82 +++++++++---------- .../create-skill/references/authoring.md | 16 ++-- .../create-skill/scripts/quick_validate.py | 7 +- 3 files changed, 51 insertions(+), 54 deletions(-) diff --git a/.agents/skills/create-skill/SKILL.md b/.agents/skills/create-skill/SKILL.md index 107f273..ebf46e3 100644 --- a/.agents/skills/create-skill/SKILL.md +++ b/.agents/skills/create-skill/SKILL.md @@ -1,6 +1,6 @@ --- name: create-skill -description: Use when creating, editing, reviewing, evaluating, packaging, optimizing or improving skills. +description: Create, edit, review, evaluate, package, and optimize skills. Use when users ask to create a skill, revise skill instructions, review a skill, run skill evals, benchmark skill performance, package a skill, or optimize a skill description for trigger accuracy. license: Apache-2.0 tags: - creator @@ -8,7 +8,7 @@ tags: - authoring metadata: author: Anthropic - version: "1.5.0" + version: "1.6.0" source: github.com/anthropics/skills catalog: utility category: meta @@ -20,64 +20,62 @@ Create new skills, review and improve existing skills, evaluate outputs, optimiz --- -## Route the Work +## Workflow -**Read only the reference that matches the user's current task.** +1. **Route the request.** Read only the reference that matches the user's current task: -| User intent | Read | -| --- | --- | -| Create a new skill or revise skill instructions | `references/authoring.md` | -| Review a created or revised skill | `references/review.md` | -| Build eval cases, run iterations, benchmark outputs, or collect human feedback | `references/evaluation.md` | -| Optimize a skill description for trigger accuracy | `references/description-optimization.md` | -| Adapt the workflow for agents without subagents, Claude Code, generic CLIs, or Cowork | `references/agent-compatibility.md` | -| Validate eval, grading, benchmark, or feedback JSON structures | `references/schemas.md` | + | User intent | Read | + | --- | --- | + | Create a new skill or revise skill instructions | `references/authoring.md` | + | Review a created or revised skill | `references/review.md` | + | Build eval cases, run iterations, benchmark outputs, or collect human feedback | `references/evaluation.md` | + | Optimize a skill description for trigger accuracy | `references/description-optimization.md` | + | Adapt the workflow for agents without subagents, Claude Code, generic CLIs, or Cowork | `references/agent-compatibility.md` | + | Validate eval, grading, benchmark, or feedback JSON structures | `references/schemas.md` | -If the request spans multiple phases, read the references in workflow order: authoring, review, evaluation, description optimization, then agent compatibility only when platform details matter. + If the request spans multiple phases, read the references in workflow order: authoring, review, evaluation, description optimization, then agent compatibility only when platform details matter. ---- +2. **Clarify activation and behavior.** Identify what the skill should do, which user phrases or contexts should trigger it, what output it should produce, and whether objective evals are useful. +3. **Write or revise the skill.** Follow `references/authoring.md` for metadata, trigger descriptions, body format, section delimiters, scan anchors, examples, helper scripts, portability, and validation. +4. **Test behavior.** Run this skill's `scripts/quick_validate.py` against the target skill when available. For router skills, confirm every `references/*.md` file has 8-10 evals mapped by `reference`; for objectively testable skills, run skill-enabled outputs against a meaningful baseline. +5. **Show evidence.** Share validation output, eval results, benchmark summaries, and relevant diffs before making another revision. +6. **Iterate deliberately.** Continue until feedback is resolved or further changes stop improving behavior. +7. **Package last.** Package the final skill only after the user is satisfied with behavior and trigger accuracy. -## Core Workflow +--- -**Clarify, write, test, show, iterate, and package — in that order.** +## Output -1. **Clarify activation and behavior**: identify what the skill should do, which user phrases or contexts should trigger it, what output it should produce, and whether objective evals are useful. -2. **Write the skill**: revise `SKILL.md` with concise metadata, focused instructions, bold scan anchors, and references for details that would bloat the main file. -3. **Test behavior**: run this skill's `scripts/quick_validate.py` against the target skill; for router skills, confirm every `references/*.md` file has 8-10 evals mapped by `reference`; for objectively testable skills, run skill-enabled outputs against a meaningful baseline. -4. **Show evidence**: share outputs and benchmark results with the user before making another revision. -5. **Iterate deliberately**: continue until feedback is resolved or further changes stop improving results. -6. **Package last**: package the final skill only after the user is satisfied with behavior and trigger accuracy. +- **For skill edits:** summarize changed files, behavior changes, and validation results. +- **For reviews:** lead with correctness, trigger, structure, safety, and test coverage findings. +- **For evals or benchmarks:** report the command, dataset or eval set, pass/fail counts, variance notes, and recommended next change. +- **For packaging:** report the generated artifact path and any remaining manual checks. --- -## Skill Authoring Rules - -**One skill, one workflow, one clear trigger — no more.** - -- **Section delimiters**: place a standalone `---` between `##` sections in authored `SKILL.md` files so models see strong structural boundaries. -- **Section principles**: open each `##` section with a single bold sentence that states the section's core principle. -- **Execution-first bodies**: start the body with the section that tells the agent what to do, usually `## Workflow`, `## Source Handling`, or `## Route the Work`; put `## Boundaries` after the main workflow or output guidance unless safety requires earlier placement. -- **Scan anchors**: use bold labels for distinct rule bullets in prose skill docs unless the section is a schema, command example, or literal output template. -- **Size discipline**: keep metadata under 100 tokens and the main instruction body under 500 lines; use references for anything that would push past that. -- **Metadata fields**: use only `name`, `description`, `license`, `tags`, and `metadata` at the top level; put `author`, `version`, `source`, `catalog`, `category`, and `references` under `metadata`. -- **Reference metadata**: use `metadata.references` for local skills or rules the skill uses as part of its workflow, including router skills that name follow-up skills as intended routes. Do not list adjacent-skill, near-miss, boundary, or example-only mentions. -- **Pushy descriptions**: explicitly name the user phrases and contexts that should trigger the skill, not just what it does. Claude tends to undertrigger, so err toward specificity. -- **Trigger placement**: put all "when to use" and skill-call scope information in the frontmatter `description`; do not add a body `Scope` section for activation criteria. Put routing, exclusions, boundaries, examples, and detailed procedures in the body or references. -- **No placeholders**: add `scripts/`, `references/`, `assets/`, or `evals/` only when the skill actually uses them. -- **Deterministic helpers**: prefer scripts for repetitive validation, grading, packaging, and report generation. -- **STAR examples**: write examples and eval prompts so reviewers can see the situation, task, expected action, and result criteria. -- **SOLID code**: keep responsibilities clear, interfaces small, and dependencies explicit without adding unnecessary layers in code-generation skills and bundled helper scripts. -- **Portability**: keep skills portable across agents unless the user asks for one specific runtime. Isolate platform-specific behavior in a compatibility section or reference. +## Boundaries + +- **Use the reference as source of truth.** Do not duplicate detailed authoring, review, evaluation, schema, or compatibility rules in this file. +- **Keep skills focused.** Center each skill on one trigger and one workflow; route broad domains through references instead of expanding the main body. +- **Avoid placeholders.** Add `scripts/`, `references/`, `assets/`, or `evals/` only when the skill actually uses them. +- **Package after behavior.** Do not package a skill before the user is satisfied with behavior and trigger accuracy. --- ## Bundled Resources -**Scripts and agents cover the full eval, grading, and packaging loop.** - - **Trigger optimization**: `scripts/run_eval.py`, `scripts/run_loop.py`, and `scripts/improve_description.py` - **Validation**: `scripts/quick_validate.py` - **Benchmark summaries**: `scripts/aggregate_benchmark.py` - **Packaging**: `scripts/package_skill.py` - **Human review UI**: `eval-viewer/generate_review.py` - **Review agents**: `agents/grader.md`, `agents/comparator.md`, `agents/analyzer.md`, and `agents/benchmark-analyzer.md` + +--- + +## Verification + +- [ ] The selected reference matches the user's current task +- [ ] Skill edits follow `references/authoring.md` +- [ ] Validation, eval, benchmark, or packaging commands were run when applicable +- [ ] Results and remaining risks are reported to the user diff --git a/.agents/skills/create-skill/references/authoring.md b/.agents/skills/create-skill/references/authoring.md index 591a37f..7eef8cc 100644 --- a/.agents/skills/create-skill/references/authoring.md +++ b/.agents/skills/create-skill/references/authoring.md @@ -22,7 +22,7 @@ Ground the draft in real source material when available: prior task traces, exis **Keep the top-level skill file focused on trigger, routing, and shared workflow.** -Required frontmatter fields are `name` and `description`. Optional top-level fields are `license`, `tags`, and `metadata`. Put `author`, `version`, `source`, `catalog`, `category`, and `references` inside `metadata`. Keep the complete frontmatter under 100 tokens. The description is the primary trigger signal, so include the core task and strongest trigger contexts, but avoid long keyword inventories. +Required frontmatter fields are `name` and `description`. Optional top-level fields are `license`, `tags`, and `metadata`. Put `author`, `version`, `source`, `catalog`, `category`, and `references` inside `metadata`. Keep the complete frontmatter under 100 tokens. The description is the primary trigger signal, so write it as action plus activation cues: name the task the skill performs, the contexts that should trigger it, representative user phrases, and exclusions only when they prevent likely misfires. Use these metadata fields: @@ -47,19 +47,21 @@ Common nested metadata fields: Use `metadata.references` when this skill actually uses another local skill or rule as part of its workflow. Include a referenced item when the body tells the agent to use, apply, delegate to, run, or route follow-up work to that skill/rule. Do not include skills that appear only as adjacent alternatives, near misses, exclusions, boundaries, or examples of work this skill should not handle. -Keep the Markdown body under 500 lines. The body should explain workflow, routing decisions, boundaries, critical rules, and output format. Move deep detail into `references/` and point to it clearly. Do not use a body `Scope` section to describe when the skill should be called; that belongs in `description` per the Agent Skills spec. +Keep the Markdown body under 500 lines. The default body shape is a `#` title, one short purpose sentence, a standalone `---`, then `## Workflow`, `## Output`, `## Boundaries`, and `## Verification`. Add `## Error Paths` when failures need explicit handling. Use specialized sections such as `## Route the Work`, `## Source Handling`, or `## Bundled Resources` only when they replace or extend the default flow. Move deep detail into `references/` and point to it clearly. Do not use a body `Scope` section to describe when the skill should be called; that belongs in `description` per the Agent Skills spec. Start the body with the section that helps the activated agent act: usually `## Workflow`, `## Source Handling`, or `## Route the Work`. Put `## Boundaries` after the main workflow or output guidance so the file opens with execution, not limits. Place boundaries first only when safety or destructive behavior must be checked before any action. Apply the house Markdown style while writing, not as a later cleanup pass: - **Section delimiters**: place a standalone `---` between `##` sections in `SKILL.md`. Keep the YAML frontmatter delimiters unchanged, and do not add an extra delimiter immediately after the frontmatter or before the `#` title. -- **Section principles**: open each `##` section with a single bold sentence that states the section's core principle. -- **Rule bullets**: use bold labels as scan anchors when each bullet is a distinct rule. +- **Intro purpose**: after the `#` title, write one short sentence that states what the skill does, then place `---` before the first `##` section. +- **Scan anchors**: use bold labels inside steps or bullets when they make distinct actions, fields, or rules easier to scan. Do not require a bold principle sentence after each `##` heading. - **Template exceptions**: do not force bold labels into schemas, command examples, literal output templates, or checklist items where they would make the example less accurate. After editing, run `create-skill/scripts/quick_validate.py ` when this skill's scripts are available. Treat style failures as authoring bugs, not optional polish. +Prefer deterministic helper scripts for repetitive validation, grading, packaging, report generation, or other mechanical checks that would otherwise be reimplemented by hand. + For router skills with `references/*.md`, create `evals/evals.json` before validation is considered complete. Each eval must include a `reference` field that points to the routed reference, and every non-schema reference must have 8-10 evals. This keeps the router honest instead of giving it one polite smoke test and hoping for the best. ## Length Budgets @@ -106,10 +108,12 @@ Avoid relying on one agent's tool names, slash commands, event stream, or UI unl **Write like a compact operating procedure for another agent.** -Use imperative instructions. Explain why constraints matter instead of stacking brittle all-caps rules. Include examples when they prevent ambiguity, but keep examples short and move large examples into references. +Use imperative instructions. Explain why constraints matter instead of stacking brittle all-caps rules. Include a `### Example` subsection under the relevant `##` section only when it clarifies behavior, boundaries, or output shape. Write examples and eval prompts so reviewers can see the situation, task, expected action, and result criteria. Keep examples short and move large examples into references. -Use bold scan anchors consistently so another agent can skim section principles and rule labels before reading details. +Use bold scan anchors where they help another agent skim distinct actions, fields, or rules before reading details. Do not add bold principle sentences after `##` headings just for style compliance. Use standalone `---` delimiters between `##` sections so long skill files segment cleanly in model context. +For code-generation skills and bundled helper scripts, keep responsibilities clear, interfaces small, and dependencies explicit without adding unnecessary layers. + Skills must not contain malware, hidden exfiltration behavior, credential capture, or instructions that would surprise the user relative to the skill description. diff --git a/.agents/skills/create-skill/scripts/quick_validate.py b/.agents/skills/create-skill/scripts/quick_validate.py index 372d1a1..a6ba6fd 100755 --- a/.agents/skills/create-skill/scripts/quick_validate.py +++ b/.agents/skills/create-skill/scripts/quick_validate.py @@ -44,13 +44,8 @@ def validate_markdown_style(skill_path): if line.startswith("## "): line_number = index + 1 - if index + 2 >= len(lines): - return False, f"{path}:{line_number}: missing bold principle after heading" - if lines[index + 1] != "": + if index + 1 >= len(lines) or lines[index + 1] != "": return False, f"{path}:{line_number}: expected blank line after heading" - principle = lines[index + 2] - if not re.match(r'^\*\*.+\*\*$', principle): - return False, f"{path}:{line_number}: expected bold principle sentence after heading" for line_number, line in iter_markdown_lines(path): if line.startswith("- ") and not line.startswith("- **") and not line.startswith("- [ ]"):