迈向大型推理模型：强化语言模型的推理综述

摘要

语言长期以来被认为是人类推理的重要工具。大型语言模型（LLMs）的突破引发了对利用这些模型来解决复杂推理任务的重大研究兴趣。研究人员已经超越了简单的自回归标记生成，引入了“思考”的概念 -- 代表推理过程中中间步骤的一系列标记。这种创新范式使LLMs能够模仿复杂的人类推理过程，如树搜索和反思性思考。最近，一种新兴的学习推理趋势应用了强化学习（RL）来训练LLMs掌握推理过程。这种方法通过试错搜索算法实现了高质量推理轨迹的自动生成，通过提供更多的训练数据显著扩展了LLMs的推理能力。此外，最近的研究表明，在测试推理时鼓励LLMs使用更多的标记进行“思考”可以进一步显著提高推理准确性。因此，训练时和测试时的扩展相结合展示了一个新的研究前沿 -- 通向大型推理模型的道路。OpenAI的o1系列的推出标志着这一研究方向的重要里程碑。在这项调查中，我们对LLM推理的最新进展进行了全面回顾。我们首先介绍LLMs的基础背景，然后探讨推动大型推理模型发展的关键技术组成部分，重点放在自动化数据构建、学习推理技术和测试时扩展上。我们还分析了构建大型推理模型的流行开源项目，并总结了开放挑战和未来研究方向。

English

Language has long been conceived as an essential tool for human reasoning. The breakthrough of Large Language Models (LLMs) has sparked significant research interest in leveraging these models to tackle complex reasoning tasks. Researchers have moved beyond simple autoregressive token generation by introducing the concept of "thought" -- a sequence of tokens representing intermediate steps in the reasoning process. This innovative paradigm enables LLMs' to mimic complex human reasoning processes, such as tree search and reflective thinking. Recently, an emerging trend of learning to reason has applied reinforcement learning (RL) to train LLMs to master reasoning processes. This approach enables the automatic generation of high-quality reasoning trajectories through trial-and-error search algorithms, significantly expanding LLMs' reasoning capacity by providing substantially more training data. Furthermore, recent studies demonstrate that encouraging LLMs to "think" with more tokens during test-time inference can further significantly boost reasoning accuracy. Therefore, the train-time and test-time scaling combined to show a new research frontier -- a path toward Large Reasoning Model. The introduction of OpenAI's o1 series marks a significant milestone in this research direction. In this survey, we present a comprehensive review of recent progress in LLM reasoning. We begin by introducing the foundational background of LLMs and then explore the key technical components driving the development of large reasoning models, with a focus on automated data construction, learning-to-reason techniques, and test-time scaling. We also analyze popular open-source projects at building large reasoning models, and conclude with open challenges and future research directions.

迈向大型推理模型：强化语言模型的推理综述

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

摘要

Support