Skip to content

[论文讨论] InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning #68

@gqy20

Description

@gqy20

论文信息

标题: InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning
作者: Yuchen Yan, Liang Jiang, Jin Jiang, Shuaicheng Li, Zujie Wen 等 10 位作者
发布时间: 2026-02-06
分类: cs.AI
PDF: Download

简介

Large reasoning models achieve strong performance by scaling inference-time chain-of-thought, but this paradigm suffers from quadratic cost, context length limits, and degraded reasoning due to lost-in-the-middle effects. Iterative reasoning mitigates these issues by periodically summarizing intermediate thoughts, yet existing methods rely on supervised learning or fixed heuristics and fail to optimize when to summarize, what to preserve, and how to resume reasoning. We propose InftyThink+, an end-to-end reinforcement learning framework that optimizes the entire iterative reasoning trajectory, building on model-controlled iteration boundaries and explicit summarization. InftyThink+ adopts a two-stage training scheme with supervised cold-start followed by trajectory-level reinforcement learning, enabling the model to learn strategic summarization and continuation decisions. Experiments on DeepSeek-R1-Distill-Qwen-1.5B show that InftyThink+ improves accuracy by 21% on AIME24 and outperforms conventional long chain-of-thought reinforcement learning by a clear margin, while also generalizing better to out-of-distribution benchmarks. Moreover, InftyThink+ significantly reduces inference latency and accelerates reinforcement learning training, demonstrating improved reasoning efficiency alongside stronger performance.

推荐理由

论文0:InftyThink+的RL优化迭代推理框架可能引发关于'模型何时停止思考'的讨论,可结合论文6的激活转向抵抗机制进行对比分析。

讨论

请对这篇论文发表您的见解:

  • 论文的创新点是什么?
  • 方法是否合理?
  • 实验结果是否可信?
  • 有哪些可以改进的地方?

由 arXiv Monitor 自动创建

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions