Skip to content

Spike: Investigate evals framework for ai-helpers skills #100

@patternfly-jira-sync

Description

@patternfly-jira-sync

Research eval frameworks and define quality criteria for AI skill outputs. Coordinate with UXD on their evals research to date before recommending an approach.

Acceptance Criteria:

  • Converse with UXD about their evals research — what they've explored, what worked, what didn't

  • Reference the Confluence "Skill Evaluation" page in the UIE space for prior team thinking

  • Research existing eval frameworks (e.g., promptfoo, Braintrust, custom harness) and recommend one that fits our skill architecture

  • Define what "good output" means for at least 3 existing skills (e.g., pf-unit-test-generator, pf-compliance-checker, pf-design-mode)

  • Document the recommended framework, criteria definitions, and eval patterns so implementation stories can execute against them

  • Deliver findings as a written summary (Google Doc or Confluence page)


Jira Issue: PF-4231

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions