Skip to content

SafeRL-Lab/AI-Agent-Reasoning-Baselines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 

Repository files navigation

AI-Agent-Reasoning-Papers

Reasoning Paper List

2025

  • AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy, Paper (Jun 13, 2025)
  • Spurious Rewards: Rethinking Training Signals in RLVR, Paper (Jun 12, 2025)
  • RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought, Paper (June 4, 2025)
  • The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, Paper (June 7, 2025)
  • Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning, Paper (June 2025)
  • Does Thinking More always Help? Understanding Test‑Time Scaling in Reasoning Models, Paper (June 2025)
  • The Illusion of Thinking: Comment on Shojaee et al., Paper (June 10, 2025)
  • ReMA: Learning to Meta-think for LLMs with Multi-agent Reinforcement Learning, Paper (May 27, 2025)
  • WorkForceAgent‑R1: Incentivizing Reasoning Capability in LLM-based Web Agents via Reinforcement Learning, Paper (May 22, 2025)
  • AdaptThink: Reasoning Models Can Learn When to Think, Paper (May 19, 2025)
  • Learning When to Think: Shaping Adaptive Reasoning in R1‑Style Models via Multi‑Stage RL, Paper (May 16, 2025)
  • Llama-Nemotron: Efficient Reasoning Models, Paper (May 15, 2025)
  • Chain‑of‑Thought Tokens are Computer Program Variables, Paper (May 2025)
  • SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model, Paper (April 13, 2025)
  • Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?, Paper (April 18, 2025)
  • Climbing the Ladder of Reasoning: What LLMs Can—and Still Can’t—Solve after SFT?, Paper (April 16, 2025)
  • Inference‑Time Scaling for Generalist Reward Modeling, Paper (April 3, 2025)
  • Test‑Time Reasoning Through Visual Human Preferences with VLMs and Soft Rewards, Paper (March 2025)
  • SimpleRL‑Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild, Paper (March 2025)
  • What Makes a Reward Model a Good Teacher? An Optimization Perspective, Paper (March 2025)
  • Sketch‑of‑Thought: Efficient LLM Reasoning with Adaptive Cognitive‑Inspired Sketching, Paper (March 2025)
  • All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine‑Tuning, Paper (March 2025)
  • Reward Shaping to Mitigate Reward Hacking in RLHF, Paper (February 2025)
  • Reward-Guided Speculative Decoding for Efficient LLM Reasoning, Paper, (Feb 14, 2025)
  • Chain of Draft: Thinking Faster by Writing Less, Paper (February 2025)
  • ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates, Paper (February 2025)
  • Step Back to Leap Forward: Self‑Backtracking for Boosting Reasoning of Language Models, Paper (February 2025)
  • SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post‑training, Paper (January 2025)
  • Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization, Paper (January 2025)
  • LLMs Can Plan Only If We Tell Them, Paper (January 2025)
  • Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain‑of‑Thought, Paper (January 2025)

2024

  • SegLLM: Multi-round Reasoning Segmentation, Paper (October 24, 2024)
  • Automatic Curriculum Expert Iteration for Reliable LLM Reasoning, Paper (October 2024)
  • Combinatorial Reasoning: Selecting Reasons in Generative AI Pipelines via Combinatorial Optimization, Paper (July 2024)
  • RouteLLM: Learning to Route LLMs with Preference Data, Paper, (Jun 26, 2024)
  • Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study, Paper (June 2024)
  • Quiet‑STaR: Language Models Can Teach Themselves to Think Before Speaking, Paper (March 2024)
  • Self‑Rewarding Language Models, Paper (January 2024)
  • The Impact of Reasoning Step Length on Large Language Models, Paper (January 2024)

2023

  • Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models, Paper (August 2023)
  • Tree of Thoughts: Deliberate Problem Solving with Large Language Models, Paper (May 2023)

2022

  • STaR: Bootstrapping Reasoning With Reasoning, Paper (March 2022)
  • Self‑Consistency Improves Chain of Thought Reasoning in Language Models, Paper (March 2022)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors