Skip to content

[ARI-35] Validate AC-03 and AC-04 β€” Log recall and classification accuracyΒ #14

@bjridicodes

Description

@bjridicodes

Linear: ARI-35

Type: πŸ”΅ User Story | Sprint: 7 | Points: 3 | AC: AC-03, AC-04

As an on-call engineer, I want ARIA to return relevant logs for at least 80% of incidents and classify errors correctly at least 70% of the time so that findings are trustworthy enough to act on.

Acceptance Criteria

Given 10 test incidents, 8 of which have available log fixtures
When Agent 2 runs on the 8 incidents with logs
Then at least 1 relevant log line is returned for 7 of those 8 (β‰₯80% recall β€” AC-03)

Given 10 test incidents with known ground-truth error classes
When Agent 3 classifies each
Then at least 7 of 10 classifications match the ground truth (β‰₯70% accuracy β€” AC-04)

Given AC-03 and AC-04 results are recorded
Then misses are analysed: were they due to missing logs, wrong platform tag, or LLM error?
And the findings are documented in the acceptance test report

Metadata

Metadata

Assignees

No one assigned

    Labels

    phase-1Phase 1 β€” POC notify-only

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions