Releases · Mikesterner87/Nano-R1

This release, tagged as v3.2.9, includes updates to the fine-tuning process of the Qwen2.5-3B-Instruct model using Generalized Reward Policy Optimization on the GSM8K dataset. We have improved the model's performance and streamlined the training workflow. Users can now expect more efficient training and enhanced results in their applications.

No results found