탐색-01: 주체적 탐색 강화 대규모 추론 모델

초록

대규모 추론 모델(LRMs)인 OpenAI-o1과 같은 모델은 대규모 강화 학습을 통해 인상적인 장거리 단계 추론 능력을 보여주었습니다. 그러나 그들의 확장된 추론 과정은 종종 지식 부족으로 인해 빈번한 불확실성과 잠재적인 오류를 야기합니다. 이 한계를 해결하기 위해 우리는 LRMs를 강화하는 Search-o1을 소개합니다. 이는 LRMs가 불확실한 지식 지점을 만났을 때 외부 지식을 동적으로 검색할 수 있도록 하는 강화 검색-증강 생성(RAG) 메커니즘과 검색된 문서를 정제하는 Reason-in-Documents 모듈을 갖추고 있습니다. Search-o1은 추론 과정에 강화된 검색 워크플로우를 통합하여 불확실한 지식 지점을 만났을 때 외부 지식을 동적으로 검색할 수 있게 합니다. 또한 검색된 문서의 상세한 성격으로 인해, 우리는 추론 체인에 주입하기 전에 검색된 정보를 심층적으로 분석하는 별도의 Reason-in-Documents 모듈을 설계하여 잡음을 최소화하고 일관된 추론 흐름을 유지합니다. 과학, 수학, 코딩 등 복잡한 추론 작업 및 여섯 개의 오픈 도메인 QA 벤치마크에서 수행된 포괄적인 실험은 Search-o1의 강력한 성능을 입증합니다. 이 접근 방식은 복잡한 추론 작업에서 LRMs의 신뢰성과 적용 가능성을 향상시키며, 더 신뢰할 수 있고 다재다능한 지능 시스템을 위한 길을 열어줍니다. 코드는 https://github.com/sunnynexus/Search-o1에서 확인할 수 있습니다.

English

Large reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise reasoning capabilities through large-scale reinforcement learning. However, their extended reasoning processes often suffer from knowledge insufficiency, leading to frequent uncertainties and potential errors. To address this limitation, we introduce Search-o1, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents. Search-o1 integrates an agentic search workflow into the reasoning process, enabling dynamic retrieval of external knowledge when LRMs encounter uncertain knowledge points. Additionally, due to the verbose nature of retrieved documents, we design a separate Reason-in-Documents module to deeply analyze the retrieved information before injecting it into the reasoning chain, minimizing noise and preserving coherent reasoning flow. Extensive experiments on complex reasoning tasks in science, mathematics, and coding, as well as six open-domain QA benchmarks, demonstrate the strong performance of Search-o1. This approach enhances the trustworthiness and applicability of LRMs in complex reasoning tasks, paving the way for more reliable and versatile intelligent systems. The code is available at https://github.com/sunnynexus/Search-o1.

탐색-01: 주체적 탐색 강화 대규모 추론 모델

Search-o1: Agentic Search-Enhanced Large Reasoning Models

초록

Summary

Support