Spike: Investigate evals framework for ai-helpers skills

Research eval frameworks and define quality criteria for AI skill outputs. Coordinate with UXD on their evals research to date before recommending an approach.

**Acceptance Criteria:**

- Converse with UXD about their evals research — what they've explored, what worked, what didn't

- Reference the Confluence "Skill Evaluation" page in the UIE space for prior team thinking

- Research existing eval frameworks (e.g., promptfoo, Braintrust, custom harness) and recommend one that fits our skill architecture

- Define what "good output" means for at least 3 existing skills (e.g., pf-unit-test-generator, pf-compliance-checker, pf-design-mode)

- Document the recommended framework, criteria definitions, and eval patterns so implementation stories can execute against them

- Deliver findings as a written summary (Google Doc or Confluence page)

---

**Jira Issue:** [PF-4231](https://redhat.atlassian.net/browse/PF-4231)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spike: Investigate evals framework for ai-helpers skills #100

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Spike: Investigate evals framework for ai-helpers skills #100

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions