Skip to content

Eval: add negative and ambiguity test cases #49

@olaservo

Description

@olaservo

Problem

All 7 eval tasks are happy-path activation tests. There are no tests for:

  1. Negative tests — prompts that should NOT trigger any skill (e.g., "What's the weather?")
  2. Ambiguity tests — prompts that could match multiple skills
  3. Error paths — misspelled skill names, invalid inputs

The EvalConfig interface only supports positive assertions (expectedSkillName, expectedOutput). There's no way to express "this should NOT activate."

Suggestion

  • Add shouldNotActivate?: boolean to EvalConfig
  • Add unexpectedSkillNames?: string[] for skills that should NOT fire
  • Update analyzeSession() to handle negative assertions
  • Create task configs like no-skill.json, ambiguous.json

Files

  • evals/lib/eval-checker.ts (EvalConfig interface, lines 5-9)
  • evals/tasks/ (new task configs needed)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions