ChatPaper.aiChatPaper

FlowReasoner:強化查詢層級元代理

FlowReasoner: Reinforcing Query-Level Meta-Agents

April 21, 2025
作者: Hongcheng Gao, Yue Liu, Yufei He, Longxu Dou, Chao Du, Zhijie Deng, Bryan Hooi, Min Lin, Tianyu Pang
cs.AI

摘要

本文提出了一種名為FlowReasoner的查詢級元代理,旨在自動化設計查詢級多代理系統,即為每個用戶查詢構建一個獨立的系統。我們的核心思想是通過外部執行反饋來激勵基於推理的元代理。具體而言,我們首先通過提煉DeepSeek R1,賦予FlowReasoner生成多代理系統的基本推理能力。隨後,我們利用帶有外部執行反饋的強化學習(RL)進一步增強其能力。設計了一種多用途獎勵機制,從性能、複雜性和效率三個方面指導RL訓練。通過這種方式,FlowReasoner能夠通過深思熟慮的推理為每個用戶查詢生成個性化的多代理系統。在工程和競賽代碼基準測試上的實驗證明了FlowReasoner的優越性。值得注意的是,它在三個基準測試中的準確率超越了o1-mini達10.52%。代碼已開源於https://github.com/sail-sg/FlowReasoner。
English
This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-based meta-agent via external execution feedback. Concretely, by distilling DeepSeek R1, we first endow the basic reasoning ability regarding the generation of multi-agent systems to FlowReasoner. Then, we further enhance it via reinforcement learning (RL) with external execution feedback. A multi-purpose reward is designed to guide the RL training from aspects of performance, complexity, and efficiency. In this manner, FlowReasoner is enabled to generate a personalized multi-agent system for each user query via deliberative reasoning. Experiments on both engineering and competition code benchmarks demonstrate the superiority of FlowReasoner. Remarkably, it surpasses o1-mini by 10.52% accuracy across three benchmarks. The code is available at https://github.com/sail-sg/FlowReasoner.

Summary

AI-Generated Summary

PDF462April 22, 2025