Linear: ARI-35
Type: π΅ User Story | Sprint: 7 | Points: 3 | AC: AC-03, AC-04
As an on-call engineer, I want ARIA to return relevant logs for at least 80% of incidents and classify errors correctly at least 70% of the time so that findings are trustworthy enough to act on.
Acceptance Criteria
Given 10 test incidents, 8 of which have available log fixtures
When Agent 2 runs on the 8 incidents with logs
Then at least 1 relevant log line is returned for 7 of those 8 (β₯80% recall β AC-03)
Given 10 test incidents with known ground-truth error classes
When Agent 3 classifies each
Then at least 7 of 10 classifications match the ground truth (β₯70% accuracy β AC-04)
Given AC-03 and AC-04 results are recorded
Then misses are analysed: were they due to missing logs, wrong platform tag, or LLM error?
And the findings are documented in the acceptance test report
Linear:
ARI-35Type: π΅ User Story | Sprint: 7 | Points: 3 | AC: AC-03, AC-04
As an on-call engineer, I want ARIA to return relevant logs for at least 80% of incidents and classify errors correctly at least 70% of the time so that findings are trustworthy enough to act on.
Acceptance Criteria