Skip to content

Leey21/Awesome-Long-CoT-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 

Repository files navigation

📝 Awesome Long-CoT Data

Awesome Updated Stars


📚 Overview

A curated collection of resources for generating Long Chain-of-Thought (CoT) data, essential for complex reasoning and multi-step problem solving in AI systems.

The repository focuses on four key research directions:

  • 🛠️ Prompt Engineering & Composition: Focuses on designing prompts (e.g., step-by-step decomposition, context augmentation, role-playing), combing short CoT into longer reasoning chains, and compositional strategies (multi-tool integration) to guide LLMs in generating longer, more coherent reasoning chains.
  • 🔄 Feedback & Regeneration: Explores iterative refinement mechanisms for improving initial CoT outputs, leveraging external feedback/critique (human annotation, model self-evaluation, rule-based validation) to enhance logical rigor and factual consistency.
  • 🎮 Reinforcement Learning Approaches: Investigates reinforcement learning frameworks (e.g., PPO, GRPO, R1-like) to align LLM-generated long reasoning chains with specific objectives through reward and policy optimization.
  • 🎓 Knowledge Distillation: Addresses methods to transfer long-chain reasoning capabilities from large models/R1-models to lightweight models, including CoT data curation, distillation algorithms, and efficiency optimization for deployment.

Each section will curate papers, datasets, code implementations, and case studies to support researchers in analyzing technical pathways and optimization strategies for long-chain reasoning. Contributions and suggestions are welcome to enrich this open-knowledge hub and advance the frontiers of LLM reasoning capabilities!

🧠 Method Categories

🛠️ Prompt Engineering & Composition

  • 2023.05 "Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models". [Paper]
  • 2023.05 "LogiCoT: Logical Chain-of-Thought Instruction Tuning". [Paper]
  • 2023.10 "Answering Questions by Meta-Reasoning over Multiple Chains of Thought". [Paper]
  • 2024.01 "The Impact of Reasoning Step Length on Large Language Models". [Paper]
  • 2024.02 "Brain-Inspired Two-Stage Approach: Enhancing Mathematical Reasoning by Imitating Human Thought Processes". [Paper]
  • 2024.07 "Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models". [Paper][Code]
  • 2024.09 "O1 Replication Journey: A Strategic Progress Report – Part 1". [Paper]
  • 2025.01 "KIMI K1.5:SCALING REINFORCEMENT LEARNING WITH LLMS". [Paper]
  • 2025.01 "rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking". [Paper]
  • 2025.02 "Self-rewarding correction for mathematical reasoning". [Paper]
  • 2025.02 "BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation". [Paper]
  • 2025.02 "Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning". [Paper]
  • 2025.03 "START: Self-taught Reasoner with Tools". [Paper]
  • 2025.03 "PROMPTCOT: Synthesizing Olympiad-level Problems for Mathematical". [Paper]
  • 2025.03 "Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models". [Paper]

🔄 Feedback & Regeneration

  • 2023.03 "Reflexion: Language Agents with Verbal Reinforcement Learning". [Paper][Code]
  • 2023.05 "Improving Factuality and Reasoning in Language Models through Multiagent Debate". [Paper][Code]
  • 2023.05 "SELF-REFINE: Iterative Refinement with Self-Feedback". [Paper]
  • 2024.05 "Self-reflection in llm agents: Effects on problem-solving performance". [Paper]]
  • 2024.05 "Large Language Models Can Self-Correct with Key Condition Verification". [Paper][Code]
  • 2024.07 "DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning". [Paper][Code]
  • 2024.11 "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision". [Paper][Code]
  • 2025.02 "FastMCTS: A Simple Sampling Strategy for Data Synthesis". [Paper]
  • 2025.03 "Cognitive behaviors that enable self-improving reasoners, or, four habits of highly effective stars". [Paper]
  • 2025.03 "Towards Widening The Distillation Bottleneck for Reasoning Models". [Paper]

🎮 Reinforcement Learning Approaches

  • 2024.01 "ReFT: Reasoning with Reinforced Fine-Tuning". [Paper][Code]
  • 2024.12 "Offline Reinforcement Learning for LLM Multi-Step Reasoning". [Paper][Code]
  • 2025.01 "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning". [Paper][Code]
  • 2025.01 "7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient". [Paper][Code]
  • 2025.01 "Open-R1: a fully open reproduction of DeepSeek-R1". [Paper][Code]
  • 2025.01 "Clean, minimal, accessible reproduction of DeepSeek R1-Zero". [Code]
  • 2025.02 "There May Not be Aha Moment in R1-Zero-like Training — A Pilot Study". [Paper][Code]
  • 2025.02 "Demystifying Long Chain-of-Thought Reasoning in LLMs". [Paper][Code]
  • 2025.02 "DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL". [Paper][Code]
  • 2025.02 "Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning". [Paper][Code]
  • 2025.02 "LIMR: Less is More for RL Scaling". [Paper][Code]
  • 2025.03 "QwQ-32B: Embracing the Power of Reinforcement Learning". [Paper]
  • 2025.03 "DAPO: an Open-Source LLM Reinforcement Learning System at Scale". [Paper][Code][Data]

🎓 Knowledge Distillation

  • 2024.11 "O1 Replication Journey – Part 2: Surpassing O1-preview through Simple Distillation Big Progress or Bitter Lesson?". [Paper]
  • 2024.12 "B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners". [Paper]
  • 2025.01 "RedStar: Does Scaling Long-CoT Data Unlock Better Slow-Reasoning Systems?". [Paper]
  • 2025.02 "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate". [Paper]
  • 2025.03 "Towards Widening The Distillation Bottleneck for Reasoning Models". [Paper]
  • 2025.03 "s1: Simple test-time scaling" [Paper]
  • 2025.03 "Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation" [Paper]
  • 2025.03 "1.4 Million Open-Source Distilled Reasoning Dataset to Empower Large Language Model Training". [Paper][Data]

🤝 Contributing

Contributions welcome! Please:

  1. Check for existing issues/pull requests Check for existing issues/pull requests
  2. Open an issue for discussion before major changes
  3. Keep entries chronological within categories
  4. Maintain consistent formatting

About

Awesome Long-CoT Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors