[ARI-68] Training corpus collector — log incident/classification pairs for future fine-tuning

**Linear:** \`ARI-68\`

**Type**: ⚙️ Enabler | **Sprint**: S7 | **Points**: 2

As a **Data Platform OPS engineer**, I want ARIA to record every processed incident alongside its Agent 3 classification so that we accumulate a labeled dataset during the POC that can be used to fine-tune a specialist model in Phase 4.

## Context

Agent 3's long-term target is a self-hosted fine-tuned model (data sovereignty — companies that cannot expose incident logs to an external LLM provider). The POC is the best opportunity to collect labeled examples at near-zero cost. A JSONL sink alongside the classifier output is sufficient; human-validated labels are added during M7 acceptance testing.

## Acceptance Criteria

\`\`\`
Given Agent 3 produces a ClassificationResult
When the result is returned
Then the tuple (incident_number, incident_metadata, log_lines, classification_result, timestamp) is appended to a JSONL file at ARIA_CORPUS_PATH (default: corpus/classifications.jsonl)

Given ARIA_CORPUS_PATH is not set
When Agent 3 runs
Then corpus collection is silently skipped (not a hard failure)

Given 10 incidents have been processed
When the corpus file is read
Then it contains 10 valid JSONL records, each with all required fields

Given a human validator updates a record with human_validated_label
When the corpus is read
Then the record contains both agent_label and human_validated_label fields
\`\`\`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ARI-68] Training corpus collector — log incident/classification pairs for future fine-tuning #24

Context

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[ARI-68] Training corpus collector — log incident/classification pairs for future fine-tuning #24

Description

Context

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions