Problem
All 7 eval tasks are happy-path activation tests. There are no tests for:
- Negative tests — prompts that should NOT trigger any skill (e.g., "What's the weather?")
- Ambiguity tests — prompts that could match multiple skills
- Error paths — misspelled skill names, invalid inputs
The EvalConfig interface only supports positive assertions (expectedSkillName, expectedOutput). There's no way to express "this should NOT activate."
Suggestion
- Add
shouldNotActivate?: boolean to EvalConfig
- Add
unexpectedSkillNames?: string[] for skills that should NOT fire
- Update
analyzeSession() to handle negative assertions
- Create task configs like
no-skill.json, ambiguous.json
Files
evals/lib/eval-checker.ts (EvalConfig interface, lines 5-9)
evals/tasks/ (new task configs needed)
Problem
All 7 eval tasks are happy-path activation tests. There are no tests for:
The
EvalConfiginterface only supports positive assertions (expectedSkillName,expectedOutput). There's no way to express "this should NOT activate."Suggestion
shouldNotActivate?: booleantoEvalConfigunexpectedSkillNames?: string[]for skills that should NOT fireanalyzeSession()to handle negative assertionsno-skill.json,ambiguous.jsonFiles
evals/lib/eval-checker.ts(EvalConfig interface, lines 5-9)evals/tasks/(new task configs needed)