ChatPaper.aiChatPaper

FlowReasoner:强化查询级元代理

FlowReasoner: Reinforcing Query-Level Meta-Agents

April 21, 2025
作者: Hongcheng Gao, Yue Liu, Yufei He, Longxu Dou, Chao Du, Zhijie Deng, Bryan Hooi, Min Lin, Tianyu Pang
cs.AI

摘要

本文提出了一种名为FlowReasoner的查询级元代理,旨在自动化设计查询级多代理系统,即每个用户查询对应一个系统。我们的核心思想是通过外部执行反馈激励基于推理的元代理。具体而言,通过提炼DeepSeek R1,我们首先赋予FlowReasoner生成多代理系统的基本推理能力。随后,我们借助外部执行反馈,通过强化学习(RL)进一步强化其能力。设计了一种多用途奖励机制,从性能、复杂性和效率三个维度指导RL训练。通过这种方式,FlowReasoner能够通过深思熟虑的推理,为每个用户查询生成个性化的多代理系统。在工程和竞赛代码基准测试上的实验验证了FlowReasoner的优越性。值得注意的是,在三个基准测试中,其准确率较o1-mini高出10.52%。代码已发布于https://github.com/sail-sg/FlowReasoner。
English
This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-based meta-agent via external execution feedback. Concretely, by distilling DeepSeek R1, we first endow the basic reasoning ability regarding the generation of multi-agent systems to FlowReasoner. Then, we further enhance it via reinforcement learning (RL) with external execution feedback. A multi-purpose reward is designed to guide the RL training from aspects of performance, complexity, and efficiency. In this manner, FlowReasoner is enabled to generate a personalized multi-agent system for each user query via deliberative reasoning. Experiments on both engineering and competition code benchmarks demonstrate the superiority of FlowReasoner. Remarkably, it surpasses o1-mini by 10.52% accuracy across three benchmarks. The code is available at https://github.com/sail-sg/FlowReasoner.

Summary

AI-Generated Summary

PDF412April 22, 2025