OpenAI 的 o1 模型推理模式的比較研究
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
October 17, 2024
作者: Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J. H. Liu
cs.AI
摘要
讓大型語言模型(LLMs)處理更廣泛的複雜任務(例如編碼、數學)已經引起許多研究人員的極大關注。隨著LLMs不斷演進,僅僅增加模型參數數量會帶來遞減的性能改進和龐大的計算成本。最近,OpenAI的o1模型表明推理策略(即測試時計算方法)也可以顯著增強LLMs的推理能力。然而,這些方法背後的機制仍未被探索。在我們的研究中,為了探究o1的推理模式,我們使用OpenAI的GPT-4o作為基礎,在三個領域(即數學、編碼、常識推理)的通用推理基準上,將o1與現有的測試時計算方法(BoN、逐步BoN、Agent Workflow和Self-Refine)進行比較。具體而言,首先,我們的實驗表明o1模型在大多數數據集上取得了最佳性能。其次,對於搜尋多樣性回應的方法(例如BoN),我們發現獎勵模型的能力和搜索空間都限制了這些方法的上限。第三,對於將問題分解為許多子問題的方法,由於Agent Workflow具有更好的領域特定系統提示以規劃更好的推理過程,因此Agent Workflow的性能優於逐步BoN。第四,值得一提的是,我們總結了o1的六種推理模式,並對幾個推理基準進行了詳細分析。
English
Enabling Large Language Models (LLMs) to handle a wider range of complex
tasks (e.g., coding, math) has drawn great attention from many researchers. As
LLMs continue to evolve, merely increasing the number of model parameters
yields diminishing performance improvements and heavy computational costs.
Recently, OpenAI's o1 model has shown that inference strategies (i.e.,
Test-time Compute methods) can also significantly enhance the reasoning
capabilities of LLMs. However, the mechanisms behind these methods are still
unexplored. In our work, to investigate the reasoning patterns of o1, we
compare o1 with existing Test-time Compute methods (BoN, Step-wise BoN, Agent
Workflow, and Self-Refine) by using OpenAI's GPT-4o as a backbone on general
reasoning benchmarks in three domains (i.e., math, coding, commonsense
reasoning). Specifically, first, our experiments show that the o1 model has
achieved the best performance on most datasets. Second, as for the methods of
searching diverse responses (e.g., BoN), we find the reward models' capability
and the search space both limit the upper boundary of these methods. Third, as
for the methods that break the problem into many sub-problems, the Agent
Workflow has achieved better performance than Step-wise BoN due to the
domain-specific system prompt for planning better reasoning processes. Fourth,
it is worth mentioning that we have summarized six reasoning patterns of o1,
and provided a detailed analysis on several reasoning benchmarks.Summary
AI-Generated Summary