迈向大型推理模型:强化语言模型的推理综述
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
January 16, 2025
作者: Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, Chenyang Shao, Yuwei Yan, Qinglong Yang, Yiwen Song, Sijian Ren, Xinyuan Hu, Yu Li, Jie Feng, Chen Gao, Yong Li
cs.AI
摘要
语言长期以来被认为是人类推理的重要工具。
大型语言模型(LLMs)的突破引发了对利用这些模型来解决复杂推理任务的重大研究兴趣。
研究人员已经超越了简单的自回归标记生成,引入了“思考”的概念 -- 代表推理过程中中间步骤的一系列标记。这种创新范式使LLMs能够模仿复杂的人类推理过程,如树搜索和反思性思考。最近,一种新兴的学习推理趋势应用了强化学习(RL)来训练LLMs掌握推理过程。这种方法通过试错搜索算法实现了高质量推理轨迹的自动生成,通过提供更多的训练数据显著扩展了LLMs的推理能力。此外,最近的研究表明,在测试推理时鼓励LLMs使用更多的标记进行“思考”可以进一步显著提高推理准确性。因此,训练时和测试时的扩展相结合展示了一个新的研究前沿 -- 通向大型推理模型的道路。OpenAI的o1系列的推出标志着这一研究方向的重要里程碑。在这项调查中,我们对LLM推理的最新进展进行了全面回顾。我们首先介绍LLMs的基础背景,然后探讨推动大型推理模型发展的关键技术组成部分,重点放在自动化数据构建、学习推理技术和测试时扩展上。我们还分析了构建大型推理模型的流行开源项目,并总结了开放挑战和未来研究方向。
English
Language has long been conceived as an essential tool for human reasoning.
The breakthrough of Large Language Models (LLMs) has sparked significant
research interest in leveraging these models to tackle complex reasoning tasks.
Researchers have moved beyond simple autoregressive token generation by
introducing the concept of "thought" -- a sequence of tokens representing
intermediate steps in the reasoning process. This innovative paradigm enables
LLMs' to mimic complex human reasoning processes, such as tree search and
reflective thinking. Recently, an emerging trend of learning to reason has
applied reinforcement learning (RL) to train LLMs to master reasoning
processes. This approach enables the automatic generation of high-quality
reasoning trajectories through trial-and-error search algorithms, significantly
expanding LLMs' reasoning capacity by providing substantially more training
data. Furthermore, recent studies demonstrate that encouraging LLMs to "think"
with more tokens during test-time inference can further significantly boost
reasoning accuracy. Therefore, the train-time and test-time scaling combined to
show a new research frontier -- a path toward Large Reasoning Model. The
introduction of OpenAI's o1 series marks a significant milestone in this
research direction. In this survey, we present a comprehensive review of recent
progress in LLM reasoning. We begin by introducing the foundational background
of LLMs and then explore the key technical components driving the development
of large reasoning models, with a focus on automated data construction,
learning-to-reason techniques, and test-time scaling. We also analyze popular
open-source projects at building large reasoning models, and conclude with open
challenges and future research directions.Summary
AI-Generated Summary