Linear: `ARI-68`
Type: ⚙️ Enabler | Sprint: S7 | Points: 2
As a Data Platform OPS engineer, I want ARIA to record every processed incident alongside its Agent 3 classification so that we accumulate a labeled dataset during the POC that can be used to fine-tune a specialist model in Phase 4.
Context
Agent 3's long-term target is a self-hosted fine-tuned model (data sovereignty — companies that cannot expose incident logs to an external LLM provider). The POC is the best opportunity to collect labeled examples at near-zero cost. A JSONL sink alongside the classifier output is sufficient; human-validated labels are added during M7 acceptance testing.
Acceptance Criteria
```
Given Agent 3 produces a ClassificationResult
When the result is returned
Then the tuple (incident_number, incident_metadata, log_lines, classification_result, timestamp) is appended to a JSONL file at ARIA_CORPUS_PATH (default: corpus/classifications.jsonl)
Given ARIA_CORPUS_PATH is not set
When Agent 3 runs
Then corpus collection is silently skipped (not a hard failure)
Given 10 incidents have been processed
When the corpus file is read
Then it contains 10 valid JSONL records, each with all required fields
Given a human validator updates a record with human_validated_label
When the corpus is read
Then the record contains both agent_label and human_validated_label fields
```
Linear: `ARI-68`
Type: ⚙️ Enabler | Sprint: S7 | Points: 2
As a Data Platform OPS engineer, I want ARIA to record every processed incident alongside its Agent 3 classification so that we accumulate a labeled dataset during the POC that can be used to fine-tune a specialist model in Phase 4.
Context
Agent 3's long-term target is a self-hosted fine-tuned model (data sovereignty — companies that cannot expose incident logs to an external LLM provider). The POC is the best opportunity to collect labeled examples at near-zero cost. A JSONL sink alongside the classifier output is sufficient; human-validated labels are added during M7 acceptance testing.
Acceptance Criteria
```
Given Agent 3 produces a ClassificationResult
When the result is returned
Then the tuple (incident_number, incident_metadata, log_lines, classification_result, timestamp) is appended to a JSONL file at ARIA_CORPUS_PATH (default: corpus/classifications.jsonl)
Given ARIA_CORPUS_PATH is not set
When Agent 3 runs
Then corpus collection is silently skipped (not a hard failure)
Given 10 incidents have been processed
When the corpus file is read
Then it contains 10 valid JSONL records, each with all required fields
Given a human validator updates a record with human_validated_label
When the corpus is read
Then the record contains both agent_label and human_validated_label fields
```