ChatPaper.aiChatPaper

UFO2:桌面代理操作系统

UFO2: The Desktop AgentOS

April 20, 2025
作者: Chaoyun Zhang, He Huang, Chiming Ni, Jian Mu, Si Qin, Shilin He, Lu Wang, Fangkai Yang, Pu Zhao, Chao Du, Liqun Li, Yu Kang, Zhao Jiang, Suzhen Zheng, Rujia Wang, Jiaxu Qian, Minghua Ma, Jian-Guang Lou, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang
cs.AI

摘要

近期,依托于多模态大语言模型(LLMs)的计算机使用代理(CUAs)为通过自然语言自动化复杂桌面工作流开辟了前景广阔的方向。然而,现有的大多数CUAs仍停留在概念原型阶段,受限于浅层次的操作系统集成、基于截图的脆弱交互方式以及执行过程中的干扰问题。 我们推出了UFO2,一款面向Windows桌面的多代理操作系统,它将CUAs提升至实用、系统级的自动化水平。UFO2的核心在于一个集中式的HostAgent,负责任务分解与协调,同时配备了一系列专为应用程序设计的AppAgent,这些AppAgent集成了原生API、领域特定知识以及统一的GUI-API操作层。这一架构在确保任务执行稳健性的同时,保持了模块化与可扩展性。通过融合Windows UI自动化(UIA)与基于视觉的解析技术,UFO2构建了一个混合控制检测管道,以支持多样化的界面风格。此外,通过推测性多动作规划,进一步提升了运行时效率,减少了每一步骤中LLM的开销。最后,画中画(PiP)界面使得自动化操作能在隔离的虚拟桌面内进行,实现了代理与用户的无干扰并行操作。 我们在超过20款实际Windows应用中对UFO2进行了评估,结果显示其在鲁棒性和执行准确性上相较于先前的CUAs有显著提升。研究结果表明,深度的操作系统集成为实现可靠、用户导向的桌面自动化提供了一条可扩展的路径。
English
Recent Computer-Using Agents (CUAs), powered by multimodal large language models (LLMs), offer a promising direction for automating complex desktop workflows through natural language. However, most existing CUAs remain conceptual prototypes, hindered by shallow OS integration, fragile screenshot-based interaction, and disruptive execution. We present UFO2, a multiagent AgentOS for Windows desktops that elevates CUAs into practical, system-level automation. UFO2 features a centralized HostAgent for task decomposition and coordination, alongside a collection of application-specialized AppAgent equipped with native APIs, domain-specific knowledge, and a unified GUI--API action layer. This architecture enables robust task execution while preserving modularity and extensibility. A hybrid control detection pipeline fuses Windows UI Automation (UIA) with vision-based parsing to support diverse interface styles. Runtime efficiency is further enhanced through speculative multi-action planning, reducing per-step LLM overhead. Finally, a Picture-in-Picture (PiP) interface enables automation within an isolated virtual desktop, allowing agents and users to operate concurrently without interference. We evaluate UFO2 across over 20 real-world Windows applications, demonstrating substantial improvements in robustness and execution accuracy over prior CUAs. Our results show that deep OS integration unlocks a scalable path toward reliable, user-aligned desktop automation.

Summary

AI-Generated Summary

PDF263April 22, 2025