GUI代理的曙光:与克劳德3.5计算机的初步案例研究
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
November 15, 2024
作者: Siyuan Hu, Mingyu Ouyang, Difei Gao, Mike Zheng Shou
cs.AI
摘要
最近发布的模型,Claude 3.5 Computer Use,作为首个前沿人工智能模型,在公共测试版中提供计算机使用的图形用户界面(GUI)代理。作为早期测试版,其在现实复杂环境中的能力尚不明确。在这个探索Claude 3.5 Computer Use的案例研究中,我们策划并组织了一系列精心设计的任务,涵盖各种领域和软件。这些案例的观察表明,Claude 3.5 Computer Use在端到端的语言到桌面操作方面具有前所未有的能力。除了这项研究,我们还提供了一个开箱即用的代理框架,用于部署基于API的GUI自动化模型,实现简单。我们的案例研究旨在展示Claude 3.5 Computer Use的能力和局限性,并通过详细分析提出关于规划、行动和评论的问题,这些问题必须考虑以供未来改进。我们希望这项初步探索能激发对GUI代理社区的未来研究。本文中的所有测试案例都可以通过该项目尝试:https://github.com/showlab/computer_use_ootb。
English
The recently released model, Claude 3.5 Computer Use, stands out as the first
frontier AI model to offer computer use in public beta as a graphical user
interface (GUI) agent. As an early beta, its capability in the real-world
complex environment remains unknown. In this case study to explore Claude 3.5
Computer Use, we curate and organize a collection of carefully designed tasks
spanning a variety of domains and software. Observations from these cases
demonstrate Claude 3.5 Computer Use's unprecedented ability in end-to-end
language to desktop actions. Along with this study, we provide an
out-of-the-box agent framework for deploying API-based GUI automation models
with easy implementation. Our case studies aim to showcase a groundwork of
capabilities and limitations of Claude 3.5 Computer Use with detailed analyses
and bring to the fore questions about planning, action, and critic, which must
be considered for future improvement. We hope this preliminary exploration will
inspire future research into the GUI agent community. All the test cases in the
paper can be tried through the project:
https://github.com/showlab/computer_use_ootb.Summary
AI-Generated Summary