PromptPasture · olegshulyakov · May 25, 2026 · May 25, 2026 · May 25, 2026 · May 25, 2026
diff --git a/.agents/skills/README.md b/.agents/skills/README.md
@@ -55,7 +55,7 @@ python3 -m scripts.package_skill ../code-database /tmp/skills-dist
 Use this validation command when changing an existing skill:
 
 ```bash
-python3 .agents/skills/create-skill/scripts/quick_validate.py .agents/skills/code-database
+python3 .agents/skills/create-skill/scripts/validate.py .agents/skills/code-database
 ```
 
 The key rule is simple: keep `SKILL.md` and any files it references together. If a skill says to read `references/postgres.md`, that file must remain available relative to the skill folder. Tiny rule, large consequences. Filesystems enjoy pettiness.
diff --git a/.agents/skills/create-skill/SKILL.md b/.agents/skills/create-skill/SKILL.md
@@ -8,7 +8,7 @@ tags:
   - authoring
 metadata:
   author: Anthropic
-  version: "1.8.0"
+  version: "1.9.0"
   source: github.com/anthropics/skills
   catalog: utility
   category: meta
@@ -31,13 +31,13 @@ Create new skills, review and improve existing skills, evaluate outputs, optimiz
    | Build eval cases, run iterations, benchmark outputs, or collect human feedback | `references/evaluation.md` |
    | Optimize a skill description for trigger accuracy | `references/description-optimization.md` |
    | Adapt the workflow for agents without subagents, Claude Code, generic CLIs, or Cowork | `references/agent-compatibility.md` |
-   | Validate eval, grading, benchmark, or feedback JSON structures | `references/schemas.md` |
+   | Validate eval YAML or grading, benchmark, and feedback JSON structures | `references/schemas.md` |
 
    If the request spans multiple phases, read the references in workflow order: authoring, review, evaluation, description optimization, then agent compatibility only when platform details matter.
 
 2. **Clarify activation and behavior.** Identify what the skill should do, which user phrases or contexts should trigger it, what output it should produce, and whether objective evals are useful.
 3. **Write or revise the skill.** Name new skills using the `<verb>-<subject>[-<variant>]` convention or a concise `<verb>` format (e.g., `code-tests`, `ask`). Follow `references/authoring.md` for metadata, trigger descriptions, `SKILL.md` body format, reference file format, section delimiters, scan anchors, examples, helper scripts, portability, and validation. Always bump `metadata.version` using semantic versioning upon any material change to a skill's files.
-4. **Test behavior.** Run this skill's `scripts/quick_validate.py` against the target skill when available. For router skills, confirm every `references/*.md` file has 8-10 evals mapped by `reference`; for objectively testable skills, run skill-enabled outputs against a meaningful baseline.
+4. **Test behavior.** Run this skill's `scripts/validate.py` against the target skill when available. For router skills, confirm every `references/*.md` file has 8-10 evals mapped by `reference`; for objectively testable skills, run skill-enabled outputs against a meaningful baseline.
 5. **Show evidence.** Share validation output, eval results, benchmark summaries, and relevant diffs before making another revision.
 6. **Iterate deliberately.** Continue until feedback is resolved or further changes stop improving behavior.
 7. **Package last.** Package the final skill only after the user is satisfied with behavior and trigger accuracy.
@@ -65,7 +65,7 @@ Create new skills, review and improve existing skills, evaluate outputs, optimiz
 ## Bundled Resources
 
 - **Trigger optimization**: `scripts/run_eval.py`, `scripts/run_loop.py`, and `scripts/improve_description.py`
-- **Validation**: `scripts/quick_validate.py`
+- **Validation**: `scripts/validate.py`
 - **Benchmark summaries**: `scripts/aggregate_benchmark.py`
 - **Packaging**: `scripts/package_skill.py`
 - **Human review UI**: `eval-viewer/generate_review.py`

diff --git a/.agents/skills/create-skill/assets/eval_review.html b/.agents/skills/create-skill/assets/eval_review.html
@@ -270,11 +270,24 @@ <h1>Eval Set Review: <span id="skill-name">__SKILL_NAME_PLACEHOLDER__</span></h1
       function exportEvalSet() {
         const valid = evalItems.filter((i) => i.query.trim() !== "");
         const data = valid.map((i) => ({ query: i.query.trim(), should_trigger: i.should_trigger }));
-        const blob = new Blob([JSON.stringify(data, null, 2)], { type: "application/json" });
+        const yaml = [
+          `name: ${JSON.stringify(skillName + " trigger evals")}`,
+          "suites:",
+          "  trigger-routing:",
+          "    description: Trigger and non-trigger routing checks.",
+          "    cases:",
+          ...data.flatMap((item, idx) => [
+            `      case-${String(idx + 1).padStart(3, "0")}:`,
+            `        query: ${JSON.stringify(item.query)}`,
+            `        should_trigger: ${item.should_trigger ? "true" : "false"}`,
+          ]),
+          "",
+        ].join("\n");
+        const blob = new Blob([yaml], { type: "application/x-yaml" });
         const url = URL.createObjectURL(blob);
         const a = document.createElement("a");
         a.href = url;
-        a.download = "eval_set.json";
+        a.download = "eval_set.yaml";
         document.body.appendChild(a);
         a.click();
         document.body.removeChild(a);