搜索-o1：主体搜索增强型大推理模型

摘要

大推理模型（LRMs）如OpenAI-o1已经通过大规模强化学习展示了令人印象深刻的长步骤推理能力。然而，它们的延伸推理过程经常因知识不足而遭受频繁的不确定性和潜在错误。为了解决这一局限性，我们引入了Search-o1，这是一个框架，通过一个带有检索增强生成（RAG）机制和一个用于细化检索文档的文档内推理模块，增强了LRMs。Search-o1将主动检索工作流程整合到推理过程中，使LRMs在遇到不确定的知识点时能够动态检索外部知识。此外，由于检索文档的冗长性质，我们设计了一个单独的文档内推理模块，在将其注入推理链之前深入分析检索到的信息，以最小化噪音并保持连贯的推理流程。在科学、数学和编码的复杂推理任务以及六个开放领域问答基准测试上进行的大量实验表明了Search-o1的强大性能。这种方法增强了LRMs在复杂推理任务中的可信度和适用性，为更可靠和多功能的智能系统铺平了道路。代码可在https://github.com/sunnynexus/Search-o1找到。

English

Large reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise reasoning capabilities through large-scale reinforcement learning. However, their extended reasoning processes often suffer from knowledge insufficiency, leading to frequent uncertainties and potential errors. To address this limitation, we introduce Search-o1, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents. Search-o1 integrates an agentic search workflow into the reasoning process, enabling dynamic retrieval of external knowledge when LRMs encounter uncertain knowledge points. Additionally, due to the verbose nature of retrieved documents, we design a separate Reason-in-Documents module to deeply analyze the retrieved information before injecting it into the reasoning chain, minimizing noise and preserving coherent reasoning flow. Extensive experiments on complex reasoning tasks in science, mathematics, and coding, as well as six open-domain QA benchmarks, demonstrate the strong performance of Search-o1. This approach enhances the trustworthiness and applicability of LRMs in complex reasoning tasks, paving the way for more reliable and versatile intelligent systems. The code is available at https://github.com/sunnynexus/Search-o1.

搜索-o1：主体搜索增强型大推理模型

Search-o1: Agentic Search-Enhanced Large Reasoning Models

摘要

Summary

Support