Add issue 10 backtesting case and stabilize LLM JSON handling#15
Draft
BrunoDC-dev wants to merge 3 commits into
Draft
Add issue 10 backtesting case and stabilize LLM JSON handling#15BrunoDC-dev wants to merge 3 commits into
BrunoDC-dev wants to merge 3 commits into
Conversation
Collaborator
Author
|
Estado de validacion actualizado:
Validaciones locales realizadas antes de abrir el PR:
Estado actual: el PR queda en orden desde el lado de checks/validacion. Sigue como draft para que lo revisemos antes de marcarlo listo. |
Collaborator
Author
|
Mapeo contra criterios de aceptacion del issue #10:
Evidencia esperada:
Resultado de la corrida evaluable: MiroFish predijo Argentina y el ground truth tambien fue Argentina, por lo tanto la metrica objetiva inicial queda como acierto. Nota: el reporte crudo tuvo problemas de idioma/calidad, registrados en la evaluacion, pero no cambia el acierto/fallo de la metrica binaria. |
Open
17 tasks
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
report_3736fb6ac644: predicted Argentina, matching the ground truth winner.main, including the report-agent localization guards that should help prevent Chinese report output going forward.Linked issue
Closes #10
How to test
cd backend && PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 uv run --frozen pytest ../tests/test_report_agent_resilience.pycd backend && uv run --frozen python -m compileall appnpm run buildValidation
cd backend && PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 uv run --frozen pytest ../tests/test_report_agent_resilience.py-> 21 passedcd backend && uv run --frozen python -m compileall app-> passednpm run build-> passed, with existing Vite chunk-size/dynamic-import warningsNotes
Backtesting.pdfandprueba-comedor.txtwere intentionally left untracked because they are local/reference artifacts, not required for the PR.