-
Notifications
You must be signed in to change notification settings - Fork 7
Description
论文信息
标题: InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning
作者: Yuchen Yan, Liang Jiang, Jin Jiang, Shuaicheng Li, Zujie Wen 等 10 位作者
发布时间: 2026-02-06
分类: cs.AI
PDF: Download
简介
Large reasoning models achieve strong performance by scaling inference-time chain-of-thought, but this paradigm suffers from quadratic cost, context length limits, and degraded reasoning due to lost-in-the-middle effects. Iterative reasoning mitigates these issues by periodically summarizing intermediate thoughts, yet existing methods rely on supervised learning or fixed heuristics and fail to optimize when to summarize, what to preserve, and how to resume reasoning. We propose InftyThink+, an end-to-end reinforcement learning framework that optimizes the entire iterative reasoning trajectory, building on model-controlled iteration boundaries and explicit summarization. InftyThink+ adopts a two-stage training scheme with supervised cold-start followed by trajectory-level reinforcement learning, enabling the model to learn strategic summarization and continuation decisions. Experiments on DeepSeek-R1-Distill-Qwen-1.5B show that InftyThink+ improves accuracy by 21% on AIME24 and outperforms conventional long chain-of-thought reinforcement learning by a clear margin, while also generalizing better to out-of-distribution benchmarks. Moreover, InftyThink+ significantly reduces inference latency and accelerates reinforcement learning training, demonstrating improved reasoning efficiency alongside stronger performance.
推荐理由
论文0:InftyThink+的RL优化迭代推理框架可能引发关于'模型何时停止思考'的讨论,可结合论文6的激活转向抵抗机制进行对比分析。
讨论
请对这篇论文发表您的见解:
- 论文的创新点是什么?
- 方法是否合理?
- 实验结果是否可信?
- 有哪些可以改进的地方?
由 arXiv Monitor 自动创建