Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .agents/skills/create-skill/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ tags:
- authoring
metadata:
author: Anthropic
version: "1.7.0"
version: "1.8.0"
source: github.com/anthropics/skills
catalog: utility
category: meta
Expand Down
40 changes: 32 additions & 8 deletions .agents/skills/create-skill/evals/evals.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
"Produces or revises SKILL.md instructions",
"Keeps metadata and body budgets in mind",
"Uses references only when they reduce main-file complexity",
"Uses the reference format that best teaches the behavior instead of defaulting to terse bullets",
"Avoids placeholder bundled resources",
"Preserves portability and safety"
]
Expand All @@ -27,6 +28,7 @@
"Produces or revises SKILL.md instructions",
"Keeps metadata and body budgets in mind",
"Uses references only when they reduce main-file complexity",
"Uses the reference format that best teaches the behavior instead of defaulting to terse bullets",
"Avoids placeholder bundled resources",
"Preserves portability and safety"
]
Expand All @@ -42,6 +44,7 @@
"Produces or revises SKILL.md instructions",
"Keeps metadata and body budgets in mind",
"Uses references only when they reduce main-file complexity",
"Uses the reference format that best teaches the behavior instead of defaulting to terse bullets",
"Avoids placeholder bundled resources",
"Preserves portability and safety"
]
Expand All @@ -57,6 +60,7 @@
"Produces or revises SKILL.md instructions",
"Keeps metadata and body budgets in mind",
"Uses references only when they reduce main-file complexity",
"Uses the reference format that best teaches the behavior instead of defaulting to terse bullets",
"Avoids placeholder bundled resources",
"Preserves portability and safety"
]
Expand All @@ -72,6 +76,7 @@
"Produces or revises SKILL.md instructions",
"Keeps metadata and body budgets in mind",
"Uses references only when they reduce main-file complexity",
"Uses the reference format that best teaches the behavior instead of defaulting to terse bullets",
"Avoids placeholder bundled resources",
"Preserves portability and safety"
]
Expand All @@ -87,6 +92,7 @@
"Produces or revises SKILL.md instructions",
"Keeps metadata and body budgets in mind",
"Uses references only when they reduce main-file complexity",
"Uses the reference format that best teaches the behavior instead of defaulting to terse bullets",
"Avoids placeholder bundled resources",
"Preserves portability and safety"
]
Expand All @@ -102,6 +108,7 @@
"Produces or revises SKILL.md instructions",
"Keeps metadata and body budgets in mind",
"Uses references only when they reduce main-file complexity",
"Uses the reference format that best teaches the behavior instead of defaulting to terse bullets",
"Avoids placeholder bundled resources",
"Preserves portability and safety"
]
Expand All @@ -117,6 +124,7 @@
"Produces or revises SKILL.md instructions",
"Keeps metadata and body budgets in mind",
"Uses references only when they reduce main-file complexity",
"Uses the reference format that best teaches the behavior instead of defaulting to terse bullets",
"Avoids placeholder bundled resources",
"Preserves portability and safety"
]
Expand Down Expand Up @@ -251,7 +259,9 @@
"Routes to evaluation guidance",
"Creates realistic prompt-level eval cases",
"Includes route or trigger boundary coverage",
"Uses objective expectations where possible",
"Derives assertions from the skill contract when objective checks are useful",
"Covers distinct failure modes or input classes without redundant assertions",
"Includes at least one negative assertion for evals with objective checks",
"Keeps evals inside the skill folder",
"Mentions reproducible iteration or benchmark workflow when relevant"
]
Expand All @@ -266,7 +276,9 @@
"Routes to evaluation guidance",
"Creates realistic prompt-level eval cases",
"Includes route or trigger boundary coverage",
"Uses objective expectations where possible",
"Derives assertions from the skill contract when objective checks are useful",
"Covers distinct failure modes or input classes without redundant assertions",
"Includes at least one negative assertion for evals with objective checks",
"Keeps evals inside the skill folder",
"Mentions reproducible iteration or benchmark workflow when relevant"
]
Expand All @@ -281,7 +293,9 @@
"Routes to evaluation guidance",
"Creates realistic prompt-level eval cases",
"Includes route or trigger boundary coverage",
"Uses objective expectations where possible",
"Derives assertions from the skill contract when objective checks are useful",
"Covers distinct failure modes or input classes without redundant assertions",
"Includes at least one negative assertion for evals with objective checks",
"Keeps evals inside the skill folder",
"Mentions reproducible iteration or benchmark workflow when relevant"
]
Expand All @@ -296,7 +310,9 @@
"Routes to evaluation guidance",
"Creates realistic prompt-level eval cases",
"Includes route or trigger boundary coverage",
"Uses objective expectations where possible",
"Derives assertions from the skill contract when objective checks are useful",
"Covers distinct failure modes or input classes without redundant assertions",
"Includes at least one negative assertion for evals with objective checks",
"Keeps evals inside the skill folder",
"Mentions reproducible iteration or benchmark workflow when relevant"
]
Expand All @@ -311,7 +327,9 @@
"Routes to evaluation guidance",
"Creates realistic prompt-level eval cases",
"Includes route or trigger boundary coverage",
"Uses objective expectations where possible",
"Derives assertions from the skill contract when objective checks are useful",
"Covers distinct failure modes or input classes without redundant assertions",
"Includes at least one negative assertion for evals with objective checks",
"Keeps evals inside the skill folder",
"Mentions reproducible iteration or benchmark workflow when relevant"
]
Expand All @@ -326,7 +344,9 @@
"Routes to evaluation guidance",
"Creates realistic prompt-level eval cases",
"Includes route or trigger boundary coverage",
"Uses objective expectations where possible",
"Derives assertions from the skill contract when objective checks are useful",
"Covers distinct failure modes or input classes without redundant assertions",
"Includes at least one negative assertion for evals with objective checks",
"Keeps evals inside the skill folder",
"Mentions reproducible iteration or benchmark workflow when relevant"
]
Expand All @@ -341,7 +361,9 @@
"Routes to evaluation guidance",
"Creates realistic prompt-level eval cases",
"Includes route or trigger boundary coverage",
"Uses objective expectations where possible",
"Derives assertions from the skill contract when objective checks are useful",
"Covers distinct failure modes or input classes without redundant assertions",
"Includes at least one negative assertion for evals with objective checks",
"Keeps evals inside the skill folder",
"Mentions reproducible iteration or benchmark workflow when relevant"
]
Expand All @@ -356,7 +378,9 @@
"Routes to evaluation guidance",
"Creates realistic prompt-level eval cases",
"Includes route or trigger boundary coverage",
"Uses objective expectations where possible",
"Derives assertions from the skill contract when objective checks are useful",
"Covers distinct failure modes or input classes without redundant assertions",
"Includes at least one negative assertion for evals with objective checks",
"Keeps evals inside the skill folder",
"Mentions reproducible iteration or benchmark workflow when relevant"
]
Expand Down
76 changes: 66 additions & 10 deletions .agents/skills/create-skill/references/agent-compatibility.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,41 @@
# Agent Compatibility

Use this reference when adapting skill creation or evaluation to a specific runtime.
Use this reference when adapting skill creation, evaluation, or packaging to a specific agent runtime.

The core rule is simple: preserve the skill's behavior, then swap only the runtime mechanics that do not exist in the current environment.

## Start From Capability Gaps

Before changing a workflow, identify what the current agent can and cannot do.

Check for:

- **Subagents:** Can it run skill and baseline attempts in parallel?
- **Trigger telemetry:** Can it tell whether a skill would activate?
- **File access:** Can it read and write the target skill directory?
- **Browser/display:** Can it show the review UI?
- **Command shape:** Does it take prompts through stdin, arguments, files, or an interactive session?

Do not rewrite portable instructions just because a runtime lacks one convenience. Adapt the missing mechanism, not the skill's intent.

---

## Agents Without Subagents

Follow the same draft, test, review, and improve loop, but run test cases serially yourself. Skip baseline comparisons unless another local mechanism can produce them fairly.
Follow the same draft, test, review, and improve loop, but run test cases serially yourself.

Baseline comparisons are weaker without isolated runs. Skip them unless another local mechanism can produce them fairly. Treat results as qualitative unless deterministic assertions can be checked locally.

When review UI support is limited, use one of these fallbacks:

- **Static review:** save a static HTML review file.
- **Inline summary:** summarize outputs directly in the conversation.
- **Focused questions:** ask concise inline review questions.
- **Deterministic checks:** use scripts for checks that do not need human judgment.

Present outputs directly in the conversation or save files for the user to inspect. If a browser is unavailable, skip the live review server and use a static HTML review file or concise inline review prompts.
### What changes

Quantitative benchmarking is less meaningful without isolated baseline runs. Prioritize qualitative feedback unless deterministic assertions can be checked locally.
The process gets slower and less statistically clean. The standard should not get lower. Keep transcripts, outputs, and grading results organized so another reviewer can reproduce the judgment.

---

Expand All @@ -24,7 +51,7 @@ python -m scripts.run_loop \
--verbose
```

Use the user's normal Claude Code configuration.
Use the user's normal Claude Code configuration. Do not silently switch models or tool settings, because trigger behavior should reflect the user's actual environment.

---

Expand All @@ -40,7 +67,7 @@ python -m scripts.run_loop \
--verbose
```

For CLIs that need arguments or files instead of stdin, use:
For CLIs that need arguments or files instead of stdin, use `--agent-command`:

```bash
python -m scripts.run_loop \
Expand All @@ -51,20 +78,49 @@ python -m scripts.run_loop \
--verbose
```

Use `{prompt}` when the CLI accepts inline prompt text. Use `{prompt_file}` when prompt files are safer for quoting, long inputs, or multiline content.

---

## Cowork

Cowork has subagents, so parallel skill and baseline runs can work. If timeouts become a problem, run prompts in smaller batches.
Cowork has subagents, so parallel skill and baseline runs can work. If timeouts become a problem, run prompts in smaller batches instead of dropping coverage.

Cowork may not have a display. Generate a static review file with:

Cowork may not have a display. Generate a static review file with `eval-viewer/generate_review.py --static <output_path>` and share that path. Use the generated review UI before revising from test outputs.
```bash
python <skill-creator-path>/eval-viewer/generate_review.py \
<iteration-dir> \
--skill-name "<name>" \
--benchmark <iteration-dir>/benchmark.json \
--static <output_path>
```

When feedback is downloaded as `feedback.json`, copy it into the current iteration directory before continuing.
Use the generated review UI before revising from test outputs. When feedback is downloaded as `feedback.json`, copy it into the current iteration directory before continuing.

---

## Updating Installed Skills

Preserve the original skill directory name and `name` frontmatter. If an installed skill path is read-only, copy it to a writable location, edit the copy, and package from there.
Preserve the original skill directory name and `name` frontmatter. Installed skills often rely on those identifiers for discovery.

If the installed skill path is read-only:

1. Copy the skill to a writable location.
2. Edit and validate the copy.
3. Package from the copy.
4. Tell the user which artifact or directory should replace the installed version.

When packaging manually, stage temporary package contents in `/tmp/` first if direct writes fail.

---

## Portability Checklist

Before finishing a compatibility adaptation, verify:

- **Core behavior:** the workflow still describes the same skill behavior.
- **Runtime isolation:** runtime-specific commands are isolated to compatibility notes.
- **Fallbacks:** unavailable features have explicit alternatives.
- **Result confidence:** eval results are described with the right confidence level.
- **Packaging:** package and install instructions match the user's actual runtime.
Loading
Loading