📝 Awesome Long-CoT Data

📚 Overview

A curated collection of resources for generating Long Chain-of-Thought (CoT) data, essential for complex reasoning and multi-step problem solving in AI systems.

The repository focuses on four key research directions:

🛠️ Prompt Engineering & Composition: Focuses on designing prompts (e.g., step-by-step decomposition, context augmentation, role-playing), combing short CoT into longer reasoning chains, and compositional strategies (multi-tool integration) to guide LLMs in generating longer, more coherent reasoning chains.
🔄 Feedback & Regeneration: Explores iterative refinement mechanisms for improving initial CoT outputs, leveraging external feedback/critique (human annotation, model self-evaluation, rule-based validation) to enhance logical rigor and factual consistency.
🎮 Reinforcement Learning Approaches: Investigates reinforcement learning frameworks (e.g., PPO, GRPO, R1-like) to align LLM-generated long reasoning chains with specific objectives through reward and policy optimization.
🎓 Knowledge Distillation: Addresses methods to transfer long-chain reasoning capabilities from large models/R1-models to lightweight models, including CoT data curation, distillation algorithms, and efficiency optimization for deployment.

Each section will curate papers, datasets, code implementations, and case studies to support researchers in analyzing technical pathways and optimization strategies for long-chain reasoning. Contributions and suggestions are welcome to enrich this open-knowledge hub and advance the frontiers of LLM reasoning capabilities!

🧠 Method Categories

🛠️ Prompt Engineering & Composition

2023.05 "Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models". [Paper]
2023.05 "LogiCoT: Logical Chain-of-Thought Instruction Tuning". [Paper]
2023.10 "Answering Questions by Meta-Reasoning over Multiple Chains of Thought". [Paper]
2024.01 "The Impact of Reasoning Step Length on Large Language Models". [Paper]
2024.02 "Brain-Inspired Two-Stage Approach: Enhancing Mathematical Reasoning by Imitating Human Thought Processes". [Paper]
2024.07 "Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models". [Paper][Code]
2024.09 "O1 Replication Journey: A Strategic Progress Report – Part 1". [Paper]
2025.01 "KIMI K1.5:SCALING REINFORCEMENT LEARNING WITH LLMS". [Paper]
2025.01 "rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking". [Paper]
2025.02 "Self-rewarding correction for mathematical reasoning". [Paper]
2025.02 "BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation". [Paper]
2025.02 "Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning". [Paper]
2025.03 "START: Self-taught Reasoner with Tools". [Paper]
2025.03 "PROMPTCOT: Synthesizing Olympiad-level Problems for Mathematical". [Paper]
2025.03 "Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models". [Paper]

🔄 Feedback & Regeneration

2023.03 "Reflexion: Language Agents with Verbal Reinforcement Learning". [Paper][Code]
2023.05 "Improving Factuality and Reasoning in Language Models through Multiagent Debate". [Paper][Code]
2023.05 "SELF-REFINE: Iterative Refinement with Self-Feedback". [Paper]
2024.05 "Self-reflection in llm agents: Effects on problem-solving performance". [Paper]]
2024.05 "Large Language Models Can Self-Correct with Key Condition Verification". [Paper][Code]
2024.07 "DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning". [Paper][Code]
2024.11 "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision". [Paper][Code]
2025.02 "FastMCTS: A Simple Sampling Strategy for Data Synthesis". [Paper]
2025.03 "Cognitive behaviors that enable self-improving reasoners, or, four habits of highly effective stars". [Paper]
2025.03 "Towards Widening The Distillation Bottleneck for Reasoning Models". [Paper]

🎮 Reinforcement Learning Approaches

2024.01 "ReFT: Reasoning with Reinforced Fine-Tuning". [Paper][Code]
2024.12 "Offline Reinforcement Learning for LLM Multi-Step Reasoning". [Paper][Code]
2025.01 "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning". [Paper][Code]
2025.01 "7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient". [Paper][Code]
2025.01 "Open-R1: a fully open reproduction of DeepSeek-R1". [Paper][Code]
2025.01 "Clean, minimal, accessible reproduction of DeepSeek R1-Zero". [Code]
2025.02 "There May Not be Aha Moment in R1-Zero-like Training — A Pilot Study". [Paper][Code]
2025.02 "Demystifying Long Chain-of-Thought Reasoning in LLMs". [Paper][Code]
2025.02 "DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL". [Paper][Code]
2025.02 "Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning". [Paper][Code]
2025.02 "LIMR: Less is More for RL Scaling". [Paper][Code]
2025.03 "QwQ-32B: Embracing the Power of Reinforcement Learning". [Paper]
2025.03 "DAPO: an Open-Source LLM Reinforcement Learning System at Scale". [Paper][Code][Data]

🎓 Knowledge Distillation

2024.11 "O1 Replication Journey – Part 2: Surpassing O1-preview through Simple Distillation Big Progress or Bitter Lesson?". [Paper]
2024.12 "B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners". [Paper]
2025.01 "RedStar: Does Scaling Long-CoT Data Unlock Better Slow-Reasoning Systems?". [Paper]
2025.02 "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate". [Paper]
2025.03 "Towards Widening The Distillation Bottleneck for Reasoning Models". [Paper]
2025.03 "s1: Simple test-time scaling" [Paper]
2025.03 "Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation" [Paper]
2025.03 "1.4 Million Open-Source Distilled Reasoning Dataset to Empower Large Language Model Training". [Paper][Data]

🤝 Contributing

Contributions welcome! Please:

Check for existing issues/pull requests Check for existing issues/pull requests
Open an issue for discussion before major changes
Keep entries chronological within categories
Maintain consistent formatting

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
README.md		README.md
long_cot.webp		long_cot.webp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📝 Awesome Long-CoT Data

📚 Overview

🧠 Method Categories

🛠️ Prompt Engineering & Composition

🔄 Feedback & Regeneration

🎮 Reinforcement Learning Approaches

🎓 Knowledge Distillation

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

📝 Awesome Long-CoT Data

📚 Overview

🧠 Method Categories

🛠️ Prompt Engineering & Composition

🔄 Feedback & Regeneration

🎮 Reinforcement Learning Approaches

🎓 Knowledge Distillation

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages