[ARI-35] Validate AC-03 and AC-04 — Log recall and classification accuracy

**Linear:** `ARI-35`

**Type**: 🔵 User Story | **Sprint**: 7 | **Points**: 3 | **AC**: AC-03, AC-04

As an **on-call engineer**, I want ARIA to return relevant logs for at least 80% of incidents and classify errors correctly at least 70% of the time so that findings are trustworthy enough to act on.

## Acceptance Criteria

```
Given 10 test incidents, 8 of which have available log fixtures
When Agent 2 runs on the 8 incidents with logs
Then at least 1 relevant log line is returned for 7 of those 8 (≥80% recall — AC-03)

Given 10 test incidents with known ground-truth error classes
When Agent 3 classifies each
Then at least 7 of 10 classifications match the ground truth (≥70% accuracy — AC-04)

Given AC-03 and AC-04 results are recorded
Then misses are analysed: were they due to missing logs, wrong platform tag, or LLM error?
And the findings are documented in the acceptance test report
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ARI-35] Validate AC-03 and AC-04 — Log recall and classification accuracy #14

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[ARI-35] Validate AC-03 and AC-04 — Log recall and classification accuracy #14

Description

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions