[论文讨论] InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

## 论文信息

**标题**: [InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning](https://arxiv.org/abs/2602.06960v1)
**作者**: Yuchen Yan, Liang Jiang, Jin Jiang, Shuaicheng Li, Zujie Wen 等 10 位作者
**发布时间**: 2026-02-06
**分类**: cs.AI
**PDF**: [Download](https://arxiv.org/pdf/2602.06960v1.pdf)

## 简介

Large reasoning models achieve strong performance by scaling inference-time chain-of-thought, but this paradigm suffers from quadratic cost, context length limits, and degraded reasoning due to lost-in-the-middle effects. Iterative reasoning mitigates these issues by periodically summarizing intermediate thoughts, yet existing methods rely on supervised learning or fixed heuristics and fail to optimize when to summarize, what to preserve, and how to resume reasoning. We propose InftyThink+, an end-to-end reinforcement learning framework that optimizes the entire iterative reasoning trajectory, building on model-controlled iteration boundaries and explicit summarization. InftyThink+ adopts a two-stage training scheme with supervised cold-start followed by trajectory-level reinforcement learning, enabling the model to learn strategic summarization and continuation decisions. Experiments on DeepSeek-R1-Distill-Qwen-1.5B show that InftyThink+ improves accuracy by 21% on AIME24 and outperforms conventional long chain-of-thought reinforcement learning by a clear margin, while also generalizing better to out-of-distribution benchmarks. Moreover, InftyThink+ significantly reduces inference latency and accelerates reinforcement learning training, demonstrating improved reasoning efficiency alongside stronger performance.

## 推荐理由

论文0：InftyThink+的RL优化迭代推理框架可能引发关于'模型何时停止思考'的讨论，可结合论文6的激活转向抵抗机制进行对比分析。

## 讨论

请对这篇论文发表您的见解：
- 论文的创新点是什么？
- 方法是否合理？
- 实验结果是否可信？
- 有哪些可以改进的地方？

---
_由 arXiv Monitor 自动创建_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[论文讨论] InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning #68

论文信息

简介

推荐理由

讨论

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[论文讨论] InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning #68

Description

论文信息

简介

推荐理由

讨论

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions