Add issue 10 backtesting case and stabilize LLM JSON handling by BrunoDC-dev · Pull Request #15 · LucasErcolano/MiroFish

BrunoDC-dev · 2026-05-25T01:22:51Z

Summary

Adds the issue S1 - Backtesting case A: simple self-verifiable event #10 backtesting case for Argentina vs Colombia Copa America 2024 with cutoff-controlled input, hidden ground truth, run notes, raw outputs, translated output, and objective evaluation.
Records the successful run report_3736fb6ac644: predicted Argentina, matching the ground truth winner.
Hardens OpenAI-compatible LLM calls used by ontology/Graphiti flows with timeouts, retries, JSON response mode, and schema normalization for common provider deviations.
Keeps Graphiti node attribute extraction disabled by default to avoid oversized/truncated JSON during the pilot flow.
Merges latest main, including the report-agent localization guards that should help prevent Chinese report output going forward.

Linked issue

Closes #10

How to test

cd backend && PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 uv run --frozen pytest ../tests/test_report_agent_resilience.py
cd backend && uv run --frozen python -m compileall app
npm run build

Validation

cd backend && PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 uv run --frozen pytest ../tests/test_report_agent_resilience.py -> 21 passed
cd backend && uv run --frozen python -m compileall app -> passed
npm run build -> passed, with existing Vite chunk-size/dynamic-import warnings

Notes

Backtesting.pdf and prueba-comedor.txt were intentionally left untracked because they are local/reference artifacts, not required for the PR.
The successful report is objectively correct for the winner metric, but the original raw output had language-quality issues. The translated copy is included only as an analysis artifact, while the raw output remains preserved.

…ting-case-a

BrunoDC-dev · 2026-05-25T01:57:14Z

Estado de validacion actualizado:

El check de GitHub Actions validate habia fallado inicialmente solo por formato del cuerpo del PR: faltaban ## Linked issue y ## How to test.
Ya se agregaron esas secciones con Closes #10 y los pasos de prueba.
El nuevo run de validate paso correctamente.

Validaciones locales realizadas antes de abrir el PR:

cd backend && PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 uv run --frozen pytest ../tests/test_report_agent_resilience.py -> 21 passed
cd backend && uv run --frozen python -m compileall app -> passed
npm run build -> passed, solo con warnings existentes de Vite sobre chunk size/import dinamico

Estado actual: el PR queda en orden desde el lado de checks/validacion. Sigue como draft para que lo revisemos antes de marcarlo listo.

BrunoDC-dev · 2026-05-25T01:58:32Z

Mapeo contra criterios de aceptacion del issue #10:

Ficha del caso con dominio, pregunta, x, delta y resultado real: incluido en backtesting/case-a/README.md y backtesting/case-a/ground-truth.md.
Documentos de entrada versionados/listados con fecha/fuente: incluidos en backtesting/case-a/input/source-01-opta-preview.txt y backtesting/case-a/input/source-02-conmebol-preview.txt; el detalle de fuentes y cutoff esta documentado en README.md.
La pregunta no contiene informacion posterior a x: prompt guardado en backtesting/case-a/prompt.md, con cutoff explicito al 13 de julio de 2024 y sin el resultado final.
La salida del sistema queda guardada: outputs preservados en backtesting/case-a/output/report_2d2de41798cf.md y backtesting/case-a/output/report_3736fb6ac644.md; tambien se agrega report_3736fb6ac644.es.md como traduccion auxiliar del reporte exitoso.
La evaluacion indica acierto/fallo y justificacion breve: documentado en backtesting/case-a/evaluation.md.

Evidencia esperada:

Carpeta/documento del caso: backtesting/case-a/
Input usado: backtesting/case-a/input/
Output de MiroFish: backtesting/case-a/output/
Evaluacion objetiva inicial: backtesting/case-a/evaluation.md

Resultado de la corrida evaluable: MiroFish predijo Argentina y el ground truth tambien fue Argentina, por lo tanto la metrica objetiva inicial queda como acierto. Nota: el reporte crudo tuvo problemas de idioma/calidad, registrados en la evaluacion, pero no cambia el acierto/fallo de la metrica binaria.

BrunoDC-dev added 2 commits May 24, 2026 22:19

Add backtesting case A and robust LLM handling

2e49647

Merge remote-tracking branch 'origin/main' into feat/issue-10-backtes…

c27bf6e

…ting-case-a

docs(backtesting): add benchmark objective

383872f

LucasErcolano mentioned this pull request May 27, 2026

S2 - Investigador 3: Sensibilidad posicional (Línea 3) + Ruido temporal (Línea 4) combinados #19

Open

17 tasks

Andresbravo9 mentioned this pull request Jun 3, 2026

feat(backtesting): Case B — noisy quantitative domain (B1 BTC-ETF + B2 ARG-IPC) #22

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add issue 10 backtesting case and stabilize LLM JSON handling#15

Add issue 10 backtesting case and stabilize LLM JSON handling#15
BrunoDC-dev wants to merge 3 commits into
mainfrom
feat/issue-10-backtesting-case-a

BrunoDC-dev commented May 25, 2026 •

edited

Loading

Uh oh!

BrunoDC-dev commented May 25, 2026

Uh oh!

BrunoDC-dev commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BrunoDC-dev commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Linked issue

How to test

Validation

Notes

Uh oh!

BrunoDC-dev commented May 25, 2026

Uh oh!

BrunoDC-dev commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BrunoDC-dev commented May 25, 2026 •

edited

Loading