大规模推理模型的高效推断：综述

摘要

大型推理模型（LRMs）通过学习推理显著提升了大型语言模型（LLMs）的推理能力，在复杂任务解决中展现出优异性能。然而，其深思熟虑的推理过程导致了令牌使用、内存消耗及推理时间上的低效。因此，本综述专门针对LRMs的高效推理方法进行了回顾，着重于在保持推理质量的同时缓解令牌低效问题。首先，我们引入了一种分类法，将近期方法归纳为两大类：(a) 显式紧凑的思维链（CoT），它在保留显式推理结构的同时减少令牌使用；(b) 隐式潜在CoT，它将推理步骤编码于隐藏表示而非显式令牌中。同时，我们探讨了这些方法的优缺点。随后，我们从性能与效率两个维度对现有方法进行了实证分析。此外，我们提出了该领域面临的开放挑战，包括以人为中心的可控推理、推理可解释性与效率之间的权衡、确保高效推理的安全性，以及高效推理的广泛应用。另外，我们强调了通过模型融合、新架构设计及代理路由器等技术提升LRMs推理效率的关键见解。我们希望这项工作能成为一份宝贵的指南，助力研究人员在这一充满活力的领域中克服挑战。

English

Large Reasoning Models (LRMs) significantly improve the reasoning ability of Large Language Models (LLMs) by learning to reason, exhibiting promising performance in complex task-solving. However, their deliberative reasoning process leads to inefficiencies in token usage, memory consumption, and inference time. Thus, this survey provides a review of efficient inference methods designed specifically for LRMs, focusing on mitigating token inefficiency while preserving the reasoning quality. First, we introduce a taxonomy to group the recent methods into two main categories: (a) explicit compact Chain-of-Thought (CoT), which reduces tokens while keeping the explicit reasoning structure, and (b) implicit latent CoT, which encodes reasoning steps within hidden representations instead of explicit tokens. Meanwhile, we discuss their strengths and weaknesses. Then, we conduct empirical analyses on existing methods from performance and efficiency aspects. Besides, we present open challenges in this field, including human-centric controllable reasoning, trade-off between interpretability and efficiency of reasoning, ensuring safety of efficient reasoning, and broader applications of efficient reasoning. In addition, we highlight key insights for enhancing LRMs' inference efficiency via techniques such as model merging, new architectures, and agent routers. We hope this work serves as a valuable guide, helping researchers overcome challenges in this vibrant fieldhttps://github.com/yueliu1999/Awesome-Efficient-Inference-for-LRMs.

大规模推理模型的高效推断：综述

Efficient Inference for Large Reasoning Models: A Survey

摘要

Summary

Support

Support