朝向大型推理模型:對具有大型語言模型的強化推理進行調查
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
January 16, 2025
作者: Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, Chenyang Shao, Yuwei Yan, Qinglong Yang, Yiwen Song, Sijian Ren, Xinyuan Hu, Yu Li, Jie Feng, Chen Gao, Yong Li
cs.AI
摘要
語言長久以來被視為人類推理的重要工具。大型語言模型(LLMs)的突破引發了對利用這些模型應對複雜推理任務的重大研究興趣。研究人員已超越簡單的自回歸標記生成,引入了「思想」的概念 - 一系列代表推理過程中中間步驟的標記序列。這種創新範式使得LLMs能夠模擬複雜的人類推理過程,如樹搜索和反思性思考。最近,一種新興的學習推理趨勢應用了強化學習(RL)來訓練LLMs掌握推理過程。這種方法通過試錯搜索算法實現了高質量推理軌跡的自動生成,顯著擴展了LLMs的推理能力,提供了更多訓練數據。此外,最近的研究表明,在測試推理時鼓勵LLMs使用更多標記來「思考」可以進一步顯著提高推理準確性。因此,訓練時和測試時的擴展結合展示了一個通往大型推理模型的新研究前沿 - 一條通往大型推理模型的道路。OpenAI的o1系列的推出標誌著這一研究方向的重大里程碑。在這份調查中,我們對LLM推理的最新進展進行了全面回顧。我們首先介紹LLMs的基礎背景,然後探討推動大型推理模型發展的關鍵技術組件,重點關注自動數據構建、學習推理技術和測試時擴展。我們還分析了在構建大型推理模型方面的流行開源項目,最後結束於開放挑戰和未來研究方向。
English
Language has long been conceived as an essential tool for human reasoning.
The breakthrough of Large Language Models (LLMs) has sparked significant
research interest in leveraging these models to tackle complex reasoning tasks.
Researchers have moved beyond simple autoregressive token generation by
introducing the concept of "thought" -- a sequence of tokens representing
intermediate steps in the reasoning process. This innovative paradigm enables
LLMs' to mimic complex human reasoning processes, such as tree search and
reflective thinking. Recently, an emerging trend of learning to reason has
applied reinforcement learning (RL) to train LLMs to master reasoning
processes. This approach enables the automatic generation of high-quality
reasoning trajectories through trial-and-error search algorithms, significantly
expanding LLMs' reasoning capacity by providing substantially more training
data. Furthermore, recent studies demonstrate that encouraging LLMs to "think"
with more tokens during test-time inference can further significantly boost
reasoning accuracy. Therefore, the train-time and test-time scaling combined to
show a new research frontier -- a path toward Large Reasoning Model. The
introduction of OpenAI's o1 series marks a significant milestone in this
research direction. In this survey, we present a comprehensive review of recent
progress in LLM reasoning. We begin by introducing the foundational background
of LLMs and then explore the key technical components driving the development
of large reasoning models, with a focus on automated data construction,
learning-to-reason techniques, and test-time scaling. We also analyze popular
open-source projects at building large reasoning models, and conclude with open
challenges and future research directions.Summary
AI-Generated Summary