adapters/claude-code: use CC skill system for inject; split skill telemetry fields (#14) by lec77 · Pull Request #20 · SJTU-IPADS/SkVM

lec77 · 2026-05-14T10:43:39Z

Linked issue

Closes #14.

Summary

The claude-code adapter's inject mode bypassed CC's native skill system entirely — it dumped the skill content as generic text through --append-system-prompt. CC never registered it as a skill (no init-event listing, no Skill tool dispatch), and the only way SkVM could tell whether the skill was "loaded" was a sentinel-echo heuristic that fails systematically on thinking-mode models. The downstream effect was that DeepSeek-thinking / GLM-thinking jit-optimize runs reported skillLoaded=false, the optimizer abstained at round 1 with infraBlocked, and the entire skill-iteration loop died before producing a single proposal.

This PR rewires inject to use CC's actual skill system (.claude/skills/<name>/SKILL.md + a hard MUST directive forcing Skill tool invocation), and splits the conflated RunResult.skillLoaded field into two semantically distinct signals across all 7 adapters: skillProvided (structural — did we get the skill into the agent's context?) and skillObserved (behavioral — did we see the agent engage?). The mode-aware abstain gate in jit-optimize/workspace.ts now only treats inject-mode skillProvided=false as an infra failure; discover-mode non-loads are merely surfaced as a note.

Changes

Schema & data plumbing (Tasks 1–2):

core/types.ts: RunResultSchema gains skillProvided?, skillObserved?, skillMode?: "inject" | "discover". skillLoaded retained for one release as a @deprecated mirror so on-disk ~/.skvm/proposals/ evidence files keep parsing.
jit-optimize/types.ts + jit-optimize/evidence.ts: same three fields land on RunMeta; buildRunMeta forwards them and mirrors skillProvided → skillLoaded (one-way — legacy skillLoaded-only evidence is left alone so older runs aren't retroactively given a skillProvided value we can't actually substantiate).

Optimizer gating (Task 3):

jit-optimize/workspace.ts:536 — the WARNING line that drives optimizer abstain decisions is now mode-aware:
- inject + provided=false → WARNING: skill failed to load (inject mode — likely infra issue) (drives abstain as before)
- discover + provided=false → WARNING: skill not recognized by harness (discover mode — may be description mismatch) (no abstain — discover legitimately allows the model to skip loading)
- Legacy evidence without skillMode → original WARNING: skill was not loaded wording preserved.
skillObserved is intentionally NOT surfaced in evidence summaries — it would produce systematic false negatives on thinking-mode models.

claude-code adapter rewrite (Tasks 4–6):

New buildClaudeCodeInjectArtifacts / buildClaudeCodeDiscoverArtifacts factor the on-disk skill-file work out of run() so it's unit-testable without spawning CC.
Inject path now writes <workDir>/.claude/skills/<name>/SKILL.md (same path as discover) AND sets --append-system-prompt to a hard MUST directive: You have access to a project skill named `<name>` (<desc>). For this task you MUST invoke the Skill tool with name="<name>" before any other action…
skillProvided is now read from CC's system.init event's skills[] array — a structural, model-independent signal that resolves the thinking-mode false-negative cleanly. skillObserved is read from Skill/skill tool_use events with matching input.name.
detectSkillDiscover is split into detectSkillProvided_Discover (init-event) and detectSkillObserved_Discover (Skill tool-use). The old SKILL_INJECT_SENTINEL / injectedSystemPrompt / detectSkillInject machinery is deleted.

Peer adapters — field alignment (Tasks 7–12, one commit each):

bare-agent, opencode, pi, openclaw, hermes, jiuwenclaw keep their existing inject mechanisms (system-message concat / CONTEXT.md / AGENTS.md / BOOTSTRAP.md / prompt prepend). Each harness's native auto-load file IS the documented inject path; only claude-code had the wrong-mechanism bug.
All six rewire their skillLoaded plumbing to skillProvided + skillObserved, set skillMode, and write the deprecated skillLoaded mirror. The "agent produced steps → assume skill was used" inference is dropped — skillProvided=true is now set structurally at the inject step site, so the heuristic is redundant. Snippet-match and skill-tool-call detections become skillObserved.
bare-agent's <load-skill> protocol is the one place where structural and behavioral fire simultaneously — both fields go true together.

Bench report wiring (Task 13):

bench/types.ts ConditionResult exposes the two new fields; bench/conditions.ts toConditionResult propagates them. skillLoaded deprecated alias retained for older JSON reports.

Test plan

bunx tsc --noEmit — clean
bun test — 793 pass / 2 skip / 0 fail (was 777 on origin/main; net +16 tests, no regressions)
New tests added for: schema field shape, buildRunMeta forwarding (including legacy-only skillLoaded back-compat), mode-aware workspace WARNING wordings, the two new claude-code detection helpers, claude-code inject artifact builders (writes the skill file + emits MUST directive), thinking-mode-safe skillProvided detection from init events alone, bare-agent parallel-assertion coverage.
Live smoke test against a real model — not yet run. Reviewer or CI can validate against DeepSeek-thinking / GLM-thinking once landed. The integration tests cover the structural contract; the live test is only needed to confirm CC's real CLI accepts the .claude/skills/ + --append-system-prompt directive combination at runtime.

Notes for reviewers

A few design choices worth double-checking:

skillMode semantics retained. The inject / discover binary stays — but the meaning is now strictly "loading bias" (forced vs discoverable), independent of delivery mechanism. Each adapter chooses the best mechanism its harness offers. For claude-code, both modes now write to .claude/skills/; the only difference is whether a MUST directive accompanies it. For the other adapters, the existing per-harness inject path (CONTEXT.md / AGENTS.md / etc.) is unchanged because those harnesses don't expose a richer skill subsystem than their auto-load file.
Why a system prompt directive and not /skill <name> in the user prompt. The original issue suggested putting /skill <name> in the prompt to trigger the skill. /skill is interactive slash-command syntax — its behavior in claude -p non-interactive mode is undocumented and would have to be verified empirically. System-prompt directives are the documented headless path; they also let us keep the inject/discover semantic distinction crisp (directive is the only difference between the two modes' artifacts).
Forward-only mirror for the deprecated skillLoaded. buildRunMeta mirrors skillProvided → skillLoaded but never the reverse: if an older adapter writes only skillLoaded, the new skillProvided on the output stays undefined. Rationale: the old skillLoaded field merged structural and behavioral semantics, so retroactively claiming skillProvided=true from a legacy record would be lying. Consumers (workspace gate) use skillProvided ?? skillLoaded for explicit fallback.
Dropped the "agent ran → skill was used" inference in peer adapters. Tests like if (result.steps.length > 0) skillLoaded = true were always weak — they don't tell you anything about the skill, just that the agent produced output. With skillProvided now set structurally at the inject step (Bun.write(...) succeeded), the inference is redundant. Snippet-echo and skill-tool-call detections survive as skillObserved signals (diagnostic only).
skillObserved is intentionally NOT surfaced in optimizer evidence summaries. It would systematically false-negative on thinking-mode models (no echo to detect) and create noise. Optimizer code that wants the signal can read runMeta.skillObserved directly.
PR scope vs. broader inject redesign. A wider redesign — three modes (forced / discoverable / none) or richer per-harness skill mechanisms — was considered during brainstorming and intentionally deferred. This PR fixes the specific bug in issue [bug] Claude Code adapter inject模式实现Bug报告 #14 and the field-semantics conflation that amplified it. Larger architectural changes can land as a follow-up if useful.

Checklist

Linked an issue ([bug] Claude Code adapter inject模式实现Bug报告 #14)
Commit messages follow <scope>: <summary> style
Added or updated tests where appropriate
No user-visible CLI flags / on-disk paths changed — no README/docs update needed (the ~/.skvm/proposals/ schema is back-compat via the deprecated skillLoaded alias)
No unrelated changes mixed in (the 13 commits all serve this one logical change; squash if you prefer fewer atomic commits)

Splits the conflated skillLoaded field into: - skillProvided: structural — was the skill placed in agent's context? - skillObserved: behavioral — visible engagement evidence (diagnostic only). - skillMode: which mode (inject/discover) the adapter ran in. skillLoaded stays as a deprecated alias for one release for on-disk evidence file back-compat. Refs SJTU-IPADS#14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nMeta Mirrors the RunResult schema split from the previous commit: RunMeta gains skillProvided / skillObserved / skillMode; buildRunMeta forwards them from RunResult. skillLoaded stays as a deprecated alias. Refs SJTU-IPADS#14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The evidence-summary WARNING that drives optimizer abstain decisions now distinguishes inject-mode infra failures (skill never reached the model) from discover-mode non-loads (description mismatch / model didn't engage). Legacy evidence files without skillMode get the original wording for back-compat. Refs SJTU-IPADS#14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…d helpers The old detectSkillDiscover conflated two qualitatively distinct signals: the structural one (did CC's system.init event list the skill?) and the behavioral one (did the model invoke the Skill tool?). Splitting them lets downstream code report the right value to skillProvided vs skillObserved. detectSkillDiscover stays as a deprecated back-compat shim that ORs the two new helpers; it goes away with the SKILL_INJECT_SENTINEL machinery in the next commit. Refs SJTU-IPADS#14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Issue SJTU-IPADS#14: the previous inject path dumped the skill content into CC's system prompt via --append-system-prompt, bypassing CC's native skill system entirely — the skill never appeared in init event's skills[], the model couldn't invoke it via the Skill tool, and the only way to detect 'load' was a sentinel-echo heuristic that systematically failed on thinking-mode models. New inject path: - Writes the skill to .claude/skills/<name>/SKILL.md (same as discover). - Adds a hard MUST directive via --append-system-prompt instructing the model to invoke the Skill tool with the skill name before any other action. - Uses CC's system.init event as the structural skillProvided signal (model-independent, robust against thinking-mode). - Uses Skill tool_use as the behavioral skillObserved signal. The artifact-building logic is factored into two new exports (buildClaudeCodeInjectArtifacts, buildClaudeCodeDiscoverArtifacts) so the on-disk effects can be unit-tested without spawning CC. Refs SJTU-IPADS#14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

These were heuristics for the old text-append inject path that Task 5 replaced with CC's native skill system. Without that path nothing emits the sentinel, nothing reads it, and the heuristic only ever produced false negatives on thinking-mode models. Also drops the deprecated detectSkillDiscover shim — its two callers were always going to be replaced by the split provided/observed helpers and the inject rewrite removed the last one. Refs SJTU-IPADS#14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sets skillProvided structurally at the inject/register step and populates skillObserved from behavioral signals. In inject mode the skill content lands in the system prompt immediately so skillProvided flips to true at that write site. In discover mode the skill is only spliced into context on a matching <load-skill> fire, so the protocol callback raises BOTH skillProvided (structural — content is about to land in context) and skillObserved (behavioral — agent named the skill in the registry); the once-only gate is renamed discoverSkillLoaded → discoverSkillProvided to match. The deprecated skillLoaded alias mirrors skillProvided for one release. Tests gain parallel skillProvided/skillObserved/skillMode assertions next to existing skillLoaded ones. Refs SJTU-IPADS#14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sets skillProvided structurally at the inject/register step (prompt concat for inject mode, SKILL.md write + -s flag for discover mode). Drops the 'steps > 0 implies skill was loaded' inference — that heuristic conflated 'agent ran' with 'skill was used' and was the exact bug class motivating the field split. The snippet-match scan now populates skillObserved instead of overloading skillLoaded. The deprecated skillLoaded alias mirrors skillProvided for one release. Refs SJTU-IPADS#14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sets skillProvided structurally at the prompt-concat step. jiuwenclaw uses prompt prepend for both inject and discover modes (no well-known CLI skill path), so skillMode is still surfaced for downstream consumers but the actual delivery path is identical. Drops the 'steps > 0 implies skill was used' inference; behavioral evidence (snippet echo in response text) now populates skillObserved instead of overloading skillLoaded. Deprecated skillLoaded mirrors skillProvided for one release. Refs SJTU-IPADS#14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sets skillProvided structurally at the BOOTSTRAP.md / SKILL.md write sites. Drops the 'hasAssistantMessage implies skill loaded' inference in inject mode — that heuristic conflated 'agent ran' with 'skill was used'. Behavioral evidence (skill/load_skill tool call, SKILL.md path in assistant text, or skill snippet echoed in a tool result) now lands in skillObserved rather than overloading skillLoaded. The deprecated skillLoaded alias mirrors skillProvided for one release. Refs SJTU-IPADS#14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sets skillProvided structurally at the CONTEXT.md (inject) and SKILL.md (discover) write sites. Drops the inject-mode 'step_finish + CONTEXT.md exists' check — both halves were structural and already covered by setting skillProvided at the write step. The skill-tool call (discover) and text-snippet echo (both modes) now populate skillObserved as behavioral evidence. The deprecated skillLoaded alias mirrors skillProvided for one release. Refs SJTU-IPADS#14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sets skillProvided structurally at the AGENTS.md (inject) and SKILL.md (discover) write sites. Drops the 'steps > 0 implies skill was used' inference. Snippet echo in assistant text now populates skillObserved as behavioral evidence rather than overloading skillLoaded. The deprecated skillLoaded alias mirrors skillProvided for one release. Refs SJTU-IPADS#14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Propagates the RunResult field split into bench reports. The deprecated skillLoaded alias stays so existing JSON bench reports keep parsing. Refs SJTU-IPADS#14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lec77 · 2026-05-15T08:14:10Z

@YYangZiXin Hi! This PR attempts to fix #14, but takes a slightly different approach than what you proposed in the issue — would appreciate your help verifying it.

You suggested putting /skill <name> in the user prompt. The PR instead writes .claude/skills/<name>/SKILL.md and uses --append-system-prompt to add a MUST directive ("You MUST invoke the Skill tool with name=\"<name>\""). Reasoning: /skill is interactive slash-command syntax, and its behavior in claude -p non-interactive mode is undocumented — we didn't want to bet the fix on it.

The PR reads skillProvided from CC's system.init event's skills[] array, assuming CC will auto-discover workDir-scoped .claude/skills/<name>/SKILL.md and list it at init time.

Would you be willing to run the PR branch against your original failing setup (the GLM / DeepSeek thinking mode environment from #14)? Two things to confirm:

skillProvided=true appears in the evidence (CC actually recognized the skill)
jit-optimize proceeds into Round 1 instead of optimizer abstained (infraBlocked)

git fetch origin pull/20/head:pr-20
git checkout pr-20
# then run the same jit-optimize command that originally hit infra-blocked

Thanks!

lec77 and others added 13 commits May 14, 2026 01:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adapters/claude-code: use CC skill system for inject; split skill telemetry fields (#14)#20

adapters/claude-code: use CC skill system for inject; split skill telemetry fields (#14)#20
lec77 wants to merge 13 commits into
SJTU-IPADS:mainfrom
lec77:fix/skill-telemetry-claude-code-inject

lec77 commented May 14, 2026

Uh oh!

lec77 commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lec77 commented May 14, 2026

Linked issue

Summary

Changes

Test plan

Notes for reviewers

Checklist

Uh oh!

lec77 commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant