GUI代理的曙光:与克劳德3.5计算机的初步案例研究

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

November 15, 2024
作者: Siyuan Hu, Mingyu Ouyang, Difei Gao, Mike Zheng Shou
cs.AI

摘要

最近发布的模型,Claude 3.5 Computer Use,作为首个前沿人工智能模型,在公共测试版中提供计算机使用的图形用户界面(GUI)代理。作为早期测试版,其在现实复杂环境中的能力尚不明确。在这个探索Claude 3.5 Computer Use的案例研究中,我们策划并组织了一系列精心设计的任务,涵盖各种领域和软件。这些案例的观察表明,Claude 3.5 Computer Use在端到端的语言到桌面操作方面具有前所未有的能力。除了这项研究,我们还提供了一个开箱即用的代理框架,用于部署基于API的GUI自动化模型,实现简单。我们的案例研究旨在展示Claude 3.5 Computer Use的能力和局限性,并通过详细分析提出关于规划、行动和评论的问题,这些问题必须考虑以供未来改进。我们希望这项初步探索能激发对GUI代理社区的未来研究。本文中的所有测试案例都可以通过该项目尝试:https://github.com/showlab/computer_use_ootb。
English
The recently released model, Claude 3.5 Computer Use, stands out as the first frontier AI model to offer computer use in public beta as a graphical user interface (GUI) agent. As an early beta, its capability in the real-world complex environment remains unknown. In this case study to explore Claude 3.5 Computer Use, we curate and organize a collection of carefully designed tasks spanning a variety of domains and software. Observations from these cases demonstrate Claude 3.5 Computer Use's unprecedented ability in end-to-end language to desktop actions. Along with this study, we provide an out-of-the-box agent framework for deploying API-based GUI automation models with easy implementation. Our case studies aim to showcase a groundwork of capabilities and limitations of Claude 3.5 Computer Use with detailed analyses and bring to the fore questions about planning, action, and critic, which must be considered for future improvement. We hope this preliminary exploration will inspire future research into the GUI agent community. All the test cases in the paper can be tried through the project: https://github.com/showlab/computer_use_ootb.

Summary

AI-Generated Summary

PDF292November 18, 2024