大规模推理模型的高效推断:综述
Efficient Inference for Large Reasoning Models: A Survey
March 29, 2025
作者: Yue Liu, Jiaying Wu, Yufei He, Hongcheng Gao, Hongyu Chen, Baolong Bi, Jiaheng Zhang, Zhiqi Huang, Bryan Hooi
cs.AI
摘要
大型推理模型(LRMs)通过学习推理显著提升了大型语言模型(LLMs)的推理能力,在复杂任务解决中展现出优异性能。然而,其深思熟虑的推理过程导致了令牌使用、内存消耗及推理时间上的低效。因此,本综述专门针对LRMs的高效推理方法进行了回顾,着重于在保持推理质量的同时缓解令牌低效问题。首先,我们引入了一种分类法,将近期方法归纳为两大类:(a) 显式紧凑的思维链(CoT),它在保留显式推理结构的同时减少令牌使用;(b) 隐式潜在CoT,它将推理步骤编码于隐藏表示而非显式令牌中。同时,我们探讨了这些方法的优缺点。随后,我们从性能与效率两个维度对现有方法进行了实证分析。此外,我们提出了该领域面临的开放挑战,包括以人为中心的可控推理、推理可解释性与效率之间的权衡、确保高效推理的安全性,以及高效推理的广泛应用。另外,我们强调了通过模型融合、新架构设计及代理路由器等技术提升LRMs推理效率的关键见解。我们希望这项工作能成为一份宝贵的指南,助力研究人员在这一充满活力的领域中克服挑战。
English
Large Reasoning Models (LRMs) significantly improve the reasoning ability of
Large Language Models (LLMs) by learning to reason, exhibiting promising
performance in complex task-solving. However, their deliberative reasoning
process leads to inefficiencies in token usage, memory consumption, and
inference time. Thus, this survey provides a review of efficient inference
methods designed specifically for LRMs, focusing on mitigating token
inefficiency while preserving the reasoning quality. First, we introduce a
taxonomy to group the recent methods into two main categories: (a) explicit
compact Chain-of-Thought (CoT), which reduces tokens while keeping the explicit
reasoning structure, and (b) implicit latent CoT, which encodes reasoning steps
within hidden representations instead of explicit tokens. Meanwhile, we discuss
their strengths and weaknesses. Then, we conduct empirical analyses on existing
methods from performance and efficiency aspects. Besides, we present open
challenges in this field, including human-centric controllable reasoning,
trade-off between interpretability and efficiency of reasoning, ensuring safety
of efficient reasoning, and broader applications of efficient reasoning. In
addition, we highlight key insights for enhancing LRMs' inference efficiency
via techniques such as model merging, new architectures, and agent routers. We
hope this work serves as a valuable guide, helping researchers overcome
challenges in this vibrant
fieldhttps://github.com/yueliu1999/Awesome-Efficient-Inference-for-LRMs.Summary
AI-Generated Summary