Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .agents/skills/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ python3 -m scripts.package_skill ../code-database /tmp/skills-dist
Use this validation command when changing an existing skill:

```bash
python3 .agents/skills/create-skill/scripts/quick_validate.py .agents/skills/code-database
python3 .agents/skills/create-skill/scripts/validate.py .agents/skills/code-database
```

The key rule is simple: keep `SKILL.md` and any files it references together. If a skill says to read `references/postgres.md`, that file must remain available relative to the skill folder. Tiny rule, large consequences. Filesystems enjoy pettiness.
8 changes: 4 additions & 4 deletions .agents/skills/create-skill/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ tags:
- authoring
metadata:
author: Anthropic
version: "1.8.0"
version: "1.9.0"
source: github.com/anthropics/skills
catalog: utility
category: meta
Expand All @@ -31,13 +31,13 @@ Create new skills, review and improve existing skills, evaluate outputs, optimiz
| Build eval cases, run iterations, benchmark outputs, or collect human feedback | `references/evaluation.md` |
| Optimize a skill description for trigger accuracy | `references/description-optimization.md` |
| Adapt the workflow for agents without subagents, Claude Code, generic CLIs, or Cowork | `references/agent-compatibility.md` |
| Validate eval, grading, benchmark, or feedback JSON structures | `references/schemas.md` |
| Validate eval YAML or grading, benchmark, and feedback JSON structures | `references/schemas.md` |

If the request spans multiple phases, read the references in workflow order: authoring, review, evaluation, description optimization, then agent compatibility only when platform details matter.

2. **Clarify activation and behavior.** Identify what the skill should do, which user phrases or contexts should trigger it, what output it should produce, and whether objective evals are useful.
3. **Write or revise the skill.** Name new skills using the `<verb>-<subject>[-<variant>]` convention or a concise `<verb>` format (e.g., `code-tests`, `ask`). Follow `references/authoring.md` for metadata, trigger descriptions, `SKILL.md` body format, reference file format, section delimiters, scan anchors, examples, helper scripts, portability, and validation. Always bump `metadata.version` using semantic versioning upon any material change to a skill's files.
4. **Test behavior.** Run this skill's `scripts/quick_validate.py` against the target skill when available. For router skills, confirm every `references/*.md` file has 8-10 evals mapped by `reference`; for objectively testable skills, run skill-enabled outputs against a meaningful baseline.
4. **Test behavior.** Run this skill's `scripts/validate.py` against the target skill when available. For router skills, confirm every `references/*.md` file has 8-10 evals mapped by `reference`; for objectively testable skills, run skill-enabled outputs against a meaningful baseline.
5. **Show evidence.** Share validation output, eval results, benchmark summaries, and relevant diffs before making another revision.
6. **Iterate deliberately.** Continue until feedback is resolved or further changes stop improving behavior.
7. **Package last.** Package the final skill only after the user is satisfied with behavior and trigger accuracy.
Expand Down Expand Up @@ -65,7 +65,7 @@ Create new skills, review and improve existing skills, evaluate outputs, optimiz
## Bundled Resources

- **Trigger optimization**: `scripts/run_eval.py`, `scripts/run_loop.py`, and `scripts/improve_description.py`
- **Validation**: `scripts/quick_validate.py`
- **Validation**: `scripts/validate.py`
- **Benchmark summaries**: `scripts/aggregate_benchmark.py`
- **Packaging**: `scripts/package_skill.py`
- **Human review UI**: `eval-viewer/generate_review.py`
Expand Down
17 changes: 15 additions & 2 deletions .agents/skills/create-skill/assets/eval_review.html
Original file line number Diff line number Diff line change
Expand Up @@ -270,11 +270,24 @@ <h1>Eval Set Review: <span id="skill-name">__SKILL_NAME_PLACEHOLDER__</span></h1
function exportEvalSet() {
const valid = evalItems.filter((i) => i.query.trim() !== "");
const data = valid.map((i) => ({ query: i.query.trim(), should_trigger: i.should_trigger }));
const blob = new Blob([JSON.stringify(data, null, 2)], { type: "application/json" });
const yaml = [
`name: ${JSON.stringify(skillName + " trigger evals")}`,
"suites:",
" trigger-routing:",
" description: Trigger and non-trigger routing checks.",
" cases:",
...data.flatMap((item, idx) => [
` case-${String(idx + 1).padStart(3, "0")}:`,
` query: ${JSON.stringify(item.query)}`,
` should_trigger: ${item.should_trigger ? "true" : "false"}`,
]),
"",
].join("\n");
const blob = new Blob([yaml], { type: "application/x-yaml" });
const url = URL.createObjectURL(blob);
const a = document.createElement("a");
a.href = url;
a.download = "eval_set.json";
a.download = "eval_set.yaml";
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
Expand Down
Loading
Loading