语言模型的自适应并行推理学习
Learning Adaptive Parallel Reasoning with Language Models
April 21, 2025
作者: Jiayi Pan, Xiuyu Li, Long Lian, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr
cs.AI
摘要
推理时计算规模的扩展显著提升了语言模型的推理能力。然而,现有方法存在明显局限:串行化的链式思维方法生成过长的输出,导致延迟增加和上下文窗口耗尽;而并行方法如自洽性则因协调不足,造成冗余计算和性能提升有限。为应对这些不足,我们提出了自适应并行推理(APR),一种新颖的推理框架,使语言模型能够端到端地编排串行与并行计算。APR通过启用基于spawn()和join()操作的自适应多线程推理,泛化了现有推理方法。其核心创新在于端到端的强化学习策略,优化父线程与子线程的推理,无需预定义推理结构即可提升任务成功率。在倒计时推理任务上的实验验证了APR的显著优势:(1) 在相同上下文窗口下性能更高(4k上下文时83.4%对60.0%);(2) 计算量增加时展现出更优的扩展性(总token数20k时80.1%对66.6%);(3) 在同等延迟下准确率提升(约5,000ms时75.2%对57.3%)。APR标志着语言模型通过自适应计算分配自主优化其推理过程的重要一步。
English
Scaling inference-time computation has substantially improved the reasoning
capabilities of language models. However, existing methods have significant
limitations: serialized chain-of-thought approaches generate overly long
outputs, leading to increased latency and exhausted context windows, while
parallel methods such as self-consistency suffer from insufficient
coordination, resulting in redundant computations and limited performance
gains. To address these shortcomings, we propose Adaptive Parallel Reasoning
(APR), a novel reasoning framework that enables language models to orchestrate
both serialized and parallel computations end-to-end. APR generalizes existing
reasoning methods by enabling adaptive multi-threaded inference using spawn()
and join() operations. A key innovation is our end-to-end reinforcement
learning strategy, optimizing both parent and child inference threads to
enhance task success rate without requiring predefined reasoning structures.
Experiments on the Countdown reasoning task demonstrate significant benefits of
APR: (1) higher performance within the same context window (83.4% vs. 60.0% at
4k context); (2) superior scalability with increased computation (80.1% vs.
66.6% at 20k total tokens); (3) improved accuracy at equivalent latency (75.2%
vs. 57.3% at approximately 5,000ms). APR represents a step towards enabling
language models to autonomously optimize their reasoning processes through
adaptive allocation of computation.Summary
AI-Generated Summary