大規模推論モデルに向けて：大規模言語モデルを用いた強化推論の調査

要旨

言語は長らく人間の推論における重要なツールとして捉えられてきました。大規模言語モデル（LLMs）の突破は、これらのモデルを活用して複雑な推論タスクに取り組むための研究関心を高めました。研究者たちは、自己回帰的なトークン生成を超えて、「思考」という概念を導入することで、推論プロセスの中間ステップを表すトークンのシーケンスを導入しました。この革新的なパラダイムにより、LLMsは木探索や反射的思考などの複雑な人間の推論プロセスを模倣することが可能になりました。最近では、推論を学習する新しいトレンドが台頭し、強化学習（RL）を用いてLLMsに推論プロセスを習得させることが行われています。このアプローチにより、試行錯誤の探索アルゴリズムを通じて高品質な推論経路を自動生成することが可能となり、大幅に多くのトレーニングデータを提供することでLLMsの推論能力を大幅に拡張しています。さらに、最近の研究では、テスト時の推論精度をさらに大幅に向上させるために、LLMsにより多くのトークンで「考える」ことを奨励することが示されています。したがって、トレーニング時とテスト時のスケーリングが組み合わさり、大規模推論モデルへの道を示す新たな研究フロンティアが明らかになりました。OpenAIのo1シリーズの導入は、この研究方向における重要な節目となっています。この調査では、LLM推論の最近の進歩について包括的なレビューを提供します。まず、LLMsの基礎的背景を紹介し、その後、大規模推論モデルの開発を推進する主要な技術要素に焦点を当て、自動データ構築、推論学習技術、およびテスト時のスケーリングについて探求します。また、大規模推論モデルの構築における人気のあるオープンソースプロジェクトを分析し、オープンな課題と将来の研究方向で締めくくります。

English

Language has long been conceived as an essential tool for human reasoning. The breakthrough of Large Language Models (LLMs) has sparked significant research interest in leveraging these models to tackle complex reasoning tasks. Researchers have moved beyond simple autoregressive token generation by introducing the concept of "thought" -- a sequence of tokens representing intermediate steps in the reasoning process. This innovative paradigm enables LLMs' to mimic complex human reasoning processes, such as tree search and reflective thinking. Recently, an emerging trend of learning to reason has applied reinforcement learning (RL) to train LLMs to master reasoning processes. This approach enables the automatic generation of high-quality reasoning trajectories through trial-and-error search algorithms, significantly expanding LLMs' reasoning capacity by providing substantially more training data. Furthermore, recent studies demonstrate that encouraging LLMs to "think" with more tokens during test-time inference can further significantly boost reasoning accuracy. Therefore, the train-time and test-time scaling combined to show a new research frontier -- a path toward Large Reasoning Model. The introduction of OpenAI's o1 series marks a significant milestone in this research direction. In this survey, we present a comprehensive review of recent progress in LLM reasoning. We begin by introducing the foundational background of LLMs and then explore the key technical components driving the development of large reasoning models, with a focus on automated data construction, learning-to-reason techniques, and test-time scaling. We also analyze popular open-source projects at building large reasoning models, and conclude with open challenges and future research directions.

大規模推論モデルに向けて：大規模言語モデルを用いた強化推論の調査

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

要旨

Support