mint-cookbook is a monorepo of independent, self-contained MinT reproductions.
Each directory under experiments/ is a runnable work unit: enter the directory, read local AGENTS.md, then PROMPT.md when present, then README.md and train.py, run uv sync, and validate the local benchmark path before changing anything.
- Eval-first reproductions with stable benchmark entrypoints
- Small, readable experiment directories instead of early shared frameworks
- Single-file experiment runtimes with harness-level reuse, not cross-experiment imports
- Clear separation between train data, eval data, benchmark protocol, and practical execution baseline
Default order inside one experiment:
- read local
AGENTS.md, thenPROMPT.mdwhen present, thenREADME.mdandtrain.py uv syncuv run train.py --dry-run --eval-data <smoke_eval_path>uv run train.py --eval-only --eval-data <full_eval_path>- only then run training or the experiment wrapper
For expensive live benchmark reruns, prefer the experiment-local dry-run or single eval-only smoke path first. The full benchmark cost differs a lot across experiments, especially for dapo-aime, fingpt, and lawbench.
Repo-level docs by scope:
AGENTS.md: stable repo rules and hard constraintsPROMPT.md: current repo-wide task packageexperiments/maintained.json: machine-readable maintained experiment registry for developers, AI agents, and repo toolingexperiments/README.md: shared experiment contracttests/README.md: repo-level verification entrypoint for local contract tests plus live MinT smoke scriptsscaffolds/README.md: scaffold and ownership rulesskills/README.md: shared skill layout plus local tool-routing guidancedocs/repo-overview.md: longer current-state repo map when you need more context
Repo-local reusable agent skills live under skills/. Treat that directory as the single source of truth.
When adding or updating a repo-local skill, edit skills/<name>/ directly.
For local discovery in a developer checkout, point tool-specific skill directories back to that source of truth with symlinks such as .codex/skills -> ../skills and .claude/skills -> ../skills.
Do not duplicate skill contents under .codex/ or .claude/; skills/ remains the only checked-in source of truth.
chat-dpo: pairwise chat DPO with held-out preference evaldapo-aime: direct GRPO on DAPO-Math-17k with an AIME 2024 benchmark plus AIME 2025/2026 eval manifestsfingpt: FinGPT reproduction scaffold with Fineval + sentiment eval and an SFT pathlawbench: LawBench benchmark-first scaffold with a maintained LoRA SFT line
cd experiments/fingpt
uv sync
uv run train.py --dry-run --task-type fineval --eval-data smoke:data/smoke_eval.jsonl
uv run train.py --eval-only --task-type fineval --eval-data fineval:data/fingpt-fineval/test.jsonlEach experiment README.md should be enough to run that experiment without reading the rest of the repo.
- Keep experiments self-contained.
- Do not import helpers across experiments.
- When logging, CLI, artifacts, or stdout contracts change, update code and docs in the same change set.
- The shared repo-root
.env.exampleis the only checked-in live env template; runtime code still reads only the local.envbeside the entrypoint or harness you actually run. - New experiments still start from the scaffold flow in
scaffolds/, and that scaffold now defaults toimport mint,MINT_*, and--mint-timeoutfrom the first draft rather than a tinker-first runtime skeleton.