Litex LLM Development

2025.9.22 gsm8k 93.5%，minif2f 4.5%. qwen2.5-instruct-7B converge after one epoch on gsm8k. Data(8k gsm, 100 minif2f) is not enough. High quality data is even more scarce.

TODO:

More minif2f
gsm 93% is not good enough. Temperature, topk, prompt, model parameters all need to be tuned.
Synthetic theorem proof data should be used instead of gsm8k.
reproduction on deepseek-math-7B for comparison with deepseek-prover-1.5-base
More theorem proof task with medium difficulty for RL

Environment Setup

bash setup_env.sh

Download lora weights from google drive

Training

bash run.sh

Evaluation

evaluate on miniF2F

python eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Litex LLM Development

Environment Setup

Training

Evaluation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Litex LLM Development

Environment Setup

Training

Evaluation