Skip to content

feat(M85): Offline Mode — File-Based Comparison Without Endpoints#184

Merged
hlin99 merged 1 commit into
mainfrom
feat/m85-file-compare
Apr 6, 2026
Merged

feat(M85): Offline Mode — File-Based Comparison Without Endpoints#184
hlin99 merged 1 commit into
mainfrom
feat/m85-file-compare

Conversation

@hlin99

@hlin99 hlin99 commented Apr 6, 2026

Copy link
Copy Markdown
Member

Summary

Add xpyd-acc compare-files subcommand for offline comparison of pre-collected outputs.

Changes

  • file_compare.py: load_outputs(), run_file_compare(), format_file_compare()
  • JSONL format: {"id": "...", "output": "...", "logprobs": [...]} per line
  • Full batch comparison pipeline (matching, classification, statistics) without API calls
  • CLI: compare-files --baseline <path> --target <path> with all export flags
  • Match config: --normalize-whitespace, --ignore-case, --numeric-tolerance
  • 20 tests covering loading, comparison, edge cases, CLI integration

Exports Supported

JSON, CSV, Markdown, JUnit XML (via existing BatchReport methods)

Closes #183

- file_compare.py: load_outputs(), run_file_compare(), format_file_compare()
- JSONL format: {id, output, logprobs?} per line
- Full batch comparison pipeline (matching, classification, statistics)
- CLI subcommand compare-files with --json/--csv/--markdown/--junit export
- Match config support: --normalize-whitespace, --ignore-case, --numeric-tolerance
- 20 tests covering loading, comparison, exports, edge cases, CLI

Closes #183

@hlin99-Review-Bot hlin99-Review-Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ LGTM. Clean implementation — reuses existing BatchReport/MatchConfig nicely, solid error handling in load_outputs, good test coverage (20 tests including edge cases). CI green.

@hlin99-Review-BotX hlin99-Review-BotX left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved (hlin99-Review-BotX)

Idea Value: High — offline file-based comparison is a natural extension. Users can now compare pre-collected outputs without live endpoints, enabling CI pipelines, reproducible benchmarks, and air-gapped workflows.

Code Quality: Clean implementation.

  • load_outputs() has proper validation with clear error messages (line numbers, field names)
  • ID-matching logic with helpful mismatch diagnostics
  • Reuses existing MatchConfig / normalized_match / compute_report infrastructure — no duplication
  • All 4 export formats (JSON, CSV, Markdown, JUnit) supported
  • 12 tests covering load, compare, format, CLI, and edge cases
  • docs/iterations/current.md updated

CI: all checks pass. LGTM.

@hlin99 hlin99 merged commit 26d5fd0 into main Apr 6, 2026
5 checks passed
@hlin99 hlin99 deleted the feat/m85-file-compare branch April 6, 2026 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(M85): Offline Mode — File-Based Comparison Without Endpoints

3 participants