PromptPasture · olegshulyakov · May 25, 2026 · May 25, 2026
diff --git a/.agents/skills/create-skill/SKILL.md b/.agents/skills/create-skill/SKILL.md
@@ -8,7 +8,7 @@ tags:
   - authoring
 metadata:
   author: Anthropic
-  version: "1.7.0"
+  version: "1.8.0"
   source: github.com/anthropics/skills
   catalog: utility
   category: meta

diff --git a/.agents/skills/create-skill/evals/evals.json b/.agents/skills/create-skill/evals/evals.json
@@ -12,6 +12,7 @@
         "Produces or revises SKILL.md instructions",
         "Keeps metadata and body budgets in mind",
         "Uses references only when they reduce main-file complexity",
+        "Uses the reference format that best teaches the behavior instead of defaulting to terse bullets",
         "Avoids placeholder bundled resources",
         "Preserves portability and safety"
       ]
@@ -27,6 +28,7 @@
         "Produces or revises SKILL.md instructions",
         "Keeps metadata and body budgets in mind",
         "Uses references only when they reduce main-file complexity",
+        "Uses the reference format that best teaches the behavior instead of defaulting to terse bullets",
         "Avoids placeholder bundled resources",
         "Preserves portability and safety"
       ]
@@ -42,6 +44,7 @@
         "Produces or revises SKILL.md instructions",
         "Keeps metadata and body budgets in mind",
         "Uses references only when they reduce main-file complexity",
+        "Uses the reference format that best teaches the behavior instead of defaulting to terse bullets",
         "Avoids placeholder bundled resources",
         "Preserves portability and safety"
       ]
@@ -57,6 +60,7 @@
         "Produces or revises SKILL.md instructions",
         "Keeps metadata and body budgets in mind",
         "Uses references only when they reduce main-file complexity",
+        "Uses the reference format that best teaches the behavior instead of defaulting to terse bullets",
         "Avoids placeholder bundled resources",
         "Preserves portability and safety"
       ]
@@ -72,6 +76,7 @@
         "Produces or revises SKILL.md instructions",
         "Keeps metadata and body budgets in mind",
         "Uses references only when they reduce main-file complexity",
+        "Uses the reference format that best teaches the behavior instead of defaulting to terse bullets",
         "Avoids placeholder bundled resources",
         "Preserves portability and safety"
       ]
@@ -87,6 +92,7 @@
         "Produces or revises SKILL.md instructions",
         "Keeps metadata and body budgets in mind",
         "Uses references only when they reduce main-file complexity",
+        "Uses the reference format that best teaches the behavior instead of defaulting to terse bullets",
         "Avoids placeholder bundled resources",
         "Preserves portability and safety"
       ]
@@ -102,6 +108,7 @@
         "Produces or revises SKILL.md instructions",
         "Keeps metadata and body budgets in mind",
         "Uses references only when they reduce main-file complexity",
+        "Uses the reference format that best teaches the behavior instead of defaulting to terse bullets",
         "Avoids placeholder bundled resources",
         "Preserves portability and safety"
       ]
@@ -117,6 +124,7 @@
         "Produces or revises SKILL.md instructions",
         "Keeps metadata and body budgets in mind",
         "Uses references only when they reduce main-file complexity",
+        "Uses the reference format that best teaches the behavior instead of defaulting to terse bullets",
         "Avoids placeholder bundled resources",
         "Preserves portability and safety"
       ]
@@ -251,7 +259,9 @@
         "Routes to evaluation guidance",
         "Creates realistic prompt-level eval cases",
         "Includes route or trigger boundary coverage",
-        "Uses objective expectations where possible",
+        "Derives assertions from the skill contract when objective checks are useful",
+        "Covers distinct failure modes or input classes without redundant assertions",
+        "Includes at least one negative assertion for evals with objective checks",
         "Keeps evals inside the skill folder",
         "Mentions reproducible iteration or benchmark workflow when relevant"
       ]
@@ -266,7 +276,9 @@
         "Routes to evaluation guidance",
         "Creates realistic prompt-level eval cases",
         "Includes route or trigger boundary coverage",
-        "Uses objective expectations where possible",
+        "Derives assertions from the skill contract when objective checks are useful",
+        "Covers distinct failure modes or input classes without redundant assertions",
+        "Includes at least one negative assertion for evals with objective checks",
         "Keeps evals inside the skill folder",
         "Mentions reproducible iteration or benchmark workflow when relevant"
       ]
@@ -281,7 +293,9 @@
         "Routes to evaluation guidance",
         "Creates realistic prompt-level eval cases",
         "Includes route or trigger boundary coverage",
-        "Uses objective expectations where possible",
+        "Derives assertions from the skill contract when objective checks are useful",
+        "Covers distinct failure modes or input classes without redundant assertions",
+        "Includes at least one negative assertion for evals with objective checks",
         "Keeps evals inside the skill folder",
         "Mentions reproducible iteration or benchmark workflow when relevant"
       ]
@@ -296,7 +310,9 @@
         "Routes to evaluation guidance",
         "Creates realistic prompt-level eval cases",
         "Includes route or trigger boundary coverage",
-        "Uses objective expectations where possible",
+        "Derives assertions from the skill contract when objective checks are useful",
+        "Covers distinct failure modes or input classes without redundant assertions",
+        "Includes at least one negative assertion for evals with objective checks",
         "Keeps evals inside the skill folder",
         "Mentions reproducible iteration or benchmark workflow when relevant"
       ]
@@ -311,7 +327,9 @@
         "Routes to evaluation guidance",
         "Creates realistic prompt-level eval cases",
         "Includes route or trigger boundary coverage",
-        "Uses objective expectations where possible",
+        "Derives assertions from the skill contract when objective checks are useful",
+        "Covers distinct failure modes or input classes without redundant assertions",
+        "Includes at least one negative assertion for evals with objective checks",
         "Keeps evals inside the skill folder",
         "Mentions reproducible iteration or benchmark workflow when relevant"
       ]
@@ -326,7 +344,9 @@
         "Routes to evaluation guidance",
         "Creates realistic prompt-level eval cases",
         "Includes route or trigger boundary coverage",
-        "Uses objective expectations where possible",
+        "Derives assertions from the skill contract when objective checks are useful",
+        "Covers distinct failure modes or input classes without redundant assertions",
+        "Includes at least one negative assertion for evals with objective checks",
         "Keeps evals inside the skill folder",
         "Mentions reproducible iteration or benchmark workflow when relevant"
       ]
@@ -341,7 +361,9 @@
         "Routes to evaluation guidance",
         "Creates realistic prompt-level eval cases",
         "Includes route or trigger boundary coverage",
-        "Uses objective expectations where possible",
+        "Derives assertions from the skill contract when objective checks are useful",
+        "Covers distinct failure modes or input classes without redundant assertions",
+        "Includes at least one negative assertion for evals with objective checks",
         "Keeps evals inside the skill folder",
         "Mentions reproducible iteration or benchmark workflow when relevant"
       ]
@@ -356,7 +378,9 @@
         "Routes to evaluation guidance",
         "Creates realistic prompt-level eval cases",
         "Includes route or trigger boundary coverage",
-        "Uses objective expectations where possible",
+        "Derives assertions from the skill contract when objective checks are useful",
+        "Covers distinct failure modes or input classes without redundant assertions",
+        "Includes at least one negative assertion for evals with objective checks",
         "Keeps evals inside the skill folder",
         "Mentions reproducible iteration or benchmark workflow when relevant"
       ]

diff --git a/.agents/skills/create-skill/references/agent-compatibility.md b/.agents/skills/create-skill/references/agent-compatibility.md
@@ -1,14 +1,41 @@
 # Agent Compatibility
 
-Use this reference when adapting skill creation or evaluation to a specific runtime.
+Use this reference when adapting skill creation, evaluation, or packaging to a specific agent runtime.
+
+The core rule is simple: preserve the skill's behavior, then swap only the runtime mechanics that do not exist in the current environment.
+
+## Start From Capability Gaps
+
+Before changing a workflow, identify what the current agent can and cannot do.
+
+Check for:
+
+- **Subagents:** Can it run skill and baseline attempts in parallel?
+- **Trigger telemetry:** Can it tell whether a skill would activate?
+- **File access:** Can it read and write the target skill directory?
+- **Browser/display:** Can it show the review UI?
+- **Command shape:** Does it take prompts through stdin, arguments, files, or an interactive session?
+
+Do not rewrite portable instructions just because a runtime lacks one convenience. Adapt the missing mechanism, not the skill's intent.
+
+---
 
 ## Agents Without Subagents
 
-Follow the same draft, test, review, and improve loop, but run test cases serially yourself. Skip baseline comparisons unless another local mechanism can produce them fairly.
+Follow the same draft, test, review, and improve loop, but run test cases serially yourself.
+
+Baseline comparisons are weaker without isolated runs. Skip them unless another local mechanism can produce them fairly. Treat results as qualitative unless deterministic assertions can be checked locally.
+
+When review UI support is limited, use one of these fallbacks:
+
+- **Static review:** save a static HTML review file.
+- **Inline summary:** summarize outputs directly in the conversation.
+- **Focused questions:** ask concise inline review questions.
+- **Deterministic checks:** use scripts for checks that do not need human judgment.
 
-Present outputs directly in the conversation or save files for the user to inspect. If a browser is unavailable, skip the live review server and use a static HTML review file or concise inline review prompts.
+### What changes
 
-Quantitative benchmarking is less meaningful without isolated baseline runs. Prioritize qualitative feedback unless deterministic assertions can be checked locally.
+The process gets slower and less statistically clean. The standard should not get lower. Keep transcripts, outputs, and grading results organized so another reviewer can reproduce the judgment.
 
 ---
 
@@ -24,7 +51,7 @@ python -m scripts.run_loop \
   --verbose
 ```
 
-Use the user's normal Claude Code configuration.
+Use the user's normal Claude Code configuration. Do not silently switch models or tool settings, because trigger behavior should reflect the user's actual environment.
 
 ---
 
@@ -40,7 +67,7 @@ python -m scripts.run_loop \
   --verbose
 ```
 
-For CLIs that need arguments or files instead of stdin, use:
+For CLIs that need arguments or files instead of stdin, use `--agent-command`:
 
 ```bash
 python -m scripts.run_loop \
@@ -51,20 +78,49 @@ python -m scripts.run_loop \
   --verbose
 ```
 
+Use `{prompt}` when the CLI accepts inline prompt text. Use `{prompt_file}` when prompt files are safer for quoting, long inputs, or multiline content.
+
 ---
 
 ## Cowork
 
-Cowork has subagents, so parallel skill and baseline runs can work. If timeouts become a problem, run prompts in smaller batches.
+Cowork has subagents, so parallel skill and baseline runs can work. If timeouts become a problem, run prompts in smaller batches instead of dropping coverage.
+
+Cowork may not have a display. Generate a static review file with:
 
-Cowork may not have a display. Generate a static review file with `eval-viewer/generate_review.py --static <output_path>` and share that path. Use the generated review UI before revising from test outputs.
+```bash
+python <skill-creator-path>/eval-viewer/generate_review.py \
+  <iteration-dir> \
+  --skill-name "<name>" \
+  --benchmark <iteration-dir>/benchmark.json \
+  --static <output_path>
+```
 
-When feedback is downloaded as `feedback.json`, copy it into the current iteration directory before continuing.
+Use the generated review UI before revising from test outputs. When feedback is downloaded as `feedback.json`, copy it into the current iteration directory before continuing.
 
 ---
 
 ## Updating Installed Skills
 
-Preserve the original skill directory name and `name` frontmatter. If an installed skill path is read-only, copy it to a writable location, edit the copy, and package from there.
+Preserve the original skill directory name and `name` frontmatter. Installed skills often rely on those identifiers for discovery.
+
+If the installed skill path is read-only:
+
+1. Copy the skill to a writable location.
+2. Edit and validate the copy.
+3. Package from the copy.
+4. Tell the user which artifact or directory should replace the installed version.
 
 When packaging manually, stage temporary package contents in `/tmp/` first if direct writes fail.
+
+---
+
+## Portability Checklist
+
+Before finishing a compatibility adaptation, verify:
+
+- **Core behavior:** the workflow still describes the same skill behavior.
+- **Runtime isolation:** runtime-specific commands are isolated to compatibility notes.
+- **Fallbacks:** unavailable features have explicit alternatives.
+- **Result confidence:** eval results are described with the right confidence level.
+- **Packaging:** package and install instructions match the user's actual runtime.