R1-Searcher：通过强化学习激励大语言模型的搜索能力

摘要

现有的大型推理模型（LRMs）已展现出强化学习（RL）在提升大型语言模型（LLMs）复杂推理能力方面的潜力。尽管这些模型在数学和编程等挑战性任务上取得了显著成绩，但它们往往依赖内部知识解决问题，这在处理时效性强或知识密集的问题时可能力不从心，导致不准确和幻觉现象。为此，我们提出了R1-Searcher，一种新颖的两阶段基于结果的RL方法，旨在增强LLMs的搜索能力。该方法使LLMs能够在推理过程中自主调用外部搜索系统，以获取额外知识。我们的框架完全依赖RL，无需过程奖励或蒸馏进行冷启动。实验表明，我们的方法显著超越了以往强大的RAG方法，甚至在与闭源的GPT-4o-mini对比时也表现出色。

English

Existing Large Reasoning Models (LRMs) have shown the potential of reinforcement learning (RL) to enhance the complex reasoning capabilities of Large Language Models~(LLMs). While they achieve remarkable performance on challenging tasks such as mathematics and coding, they often rely on their internal knowledge to solve problems, which can be inadequate for time-sensitive or knowledge-intensive questions, leading to inaccuracies and hallucinations. To address this, we propose R1-Searcher, a novel two-stage outcome-based RL approach designed to enhance the search capabilities of LLMs. This method allows LLMs to autonomously invoke external search systems to access additional knowledge during the reasoning process. Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start. % effectively generalizing to out-of-domain datasets and supporting both Base and Instruct models. Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.

R1-Searcher：通过强化学习激励大语言模型的搜索能力

R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

摘要

Summary

Support

Support