ChatPaper.aiChatPaper

PC-Agent:面向复杂任务自动化的层次化多智能体协作框架

PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

February 20, 2025
作者: Haowei Liu, Xi Zhang, Haiyang Xu, Yuyang Wanyan, Junyang Wang, Ming Yan, Ji Zhang, Chunfeng Yuan, Changsheng Xu, Weiming Hu, Fei Huang
cs.AI

摘要

在多模态大语言模型(MLLM)驱动的图形用户界面(GUI)代理领域,相较于智能手机,个人电脑(PC)场景不仅呈现出更为复杂的交互环境,还涉及更为精细的应用内及跨应用工作流程。针对这些挑战,我们提出了一种名为PC-Agent的分层代理框架。具体而言,从感知层面出发,我们设计了主动感知模块(APM),以克服当前MLLM在截图内容理解上的不足。在决策层面,为更高效地处理复杂的用户指令及相互依赖的子任务,我们提出了一种分层多代理协作架构,将决策过程分解为指令-子任务-动作三个层级。该架构中,分别设立了管理代理、进度代理和决策代理,分别负责指令分解、进度跟踪及逐步决策。此外,还引入了反思代理,以实现自下而上的及时错误反馈与调整。我们还推出了包含25条真实世界复杂指令的新基准测试PC-Eval。在PC-Eval上的实验结果表明,PC-Agent相较于之前的最优方法,任务成功率实现了32%的绝对提升。相关代码将公开提供。
English
In the field of MLLM-based GUI agents, compared to smartphones, the PC scenario not only features a more complex interactive environment, but also involves more intricate intra- and inter-app workflows. To address these issues, we propose a hierarchical agent framework named PC-Agent. Specifically, from the perception perspective, we devise an Active Perception Module (APM) to overcome the inadequate abilities of current MLLMs in perceiving screenshot content. From the decision-making perspective, to handle complex user instructions and interdependent subtasks more effectively, we propose a hierarchical multi-agent collaboration architecture that decomposes decision-making processes into Instruction-Subtask-Action levels. Within this architecture, three agents (i.e., Manager, Progress and Decision) are set up for instruction decomposition, progress tracking and step-by-step decision-making respectively. Additionally, a Reflection agent is adopted to enable timely bottom-up error feedback and adjustment. We also introduce a new benchmark PC-Eval with 25 real-world complex instructions. Empirical results on PC-Eval show that our PC-Agent achieves a 32% absolute improvement of task success rate over previous state-of-the-art methods. The code will be publicly available.

Summary

AI-Generated Summary

PDF203February 21, 2025