GUI代理程式的曙光:與Claude 3.5電腦的初步案例研究
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
November 15, 2024
作者: Siyuan Hu, Mingyu Ouyang, Difei Gao, Mike Zheng Shou
cs.AI
摘要
最近发布的模型,Claude 3.5 Computer Use,在首個前沿AI模型中以圖形使用者介面(GUI)代理程式的形式提供電腦使用的公開測試版。作為早期測試版,其在實際複雜環境中的能力尚不明確。在這個探索Claude 3.5 Computer Use的案例研究中,我們匯編和組織了一系列精心設計的任務,涵蓋各種領域和軟體。從這些案例中觀察到,Claude 3.5 Computer Use在端對端語言至桌面操作方面展現了前所未有的能力。除了這項研究,我們還提供了一個即用型代理程式框架,用於部署基於API的GUI自動化模型,實現輕鬆的實作。我們的案例研究旨在展示Claude 3.5 Computer Use的能力和限制基礎,並進行詳細分析,提出關於規劃、行動和評論的問題,這些問題必須考慮以進行未來改進。我們希望這項初步探索能激發對GUI代理程式社群的未來研究。本文中的所有測試案例都可以通過該專案進行嘗試:https://github.com/showlab/computer_use_ootb。
English
The recently released model, Claude 3.5 Computer Use, stands out as the first
frontier AI model to offer computer use in public beta as a graphical user
interface (GUI) agent. As an early beta, its capability in the real-world
complex environment remains unknown. In this case study to explore Claude 3.5
Computer Use, we curate and organize a collection of carefully designed tasks
spanning a variety of domains and software. Observations from these cases
demonstrate Claude 3.5 Computer Use's unprecedented ability in end-to-end
language to desktop actions. Along with this study, we provide an
out-of-the-box agent framework for deploying API-based GUI automation models
with easy implementation. Our case studies aim to showcase a groundwork of
capabilities and limitations of Claude 3.5 Computer Use with detailed analyses
and bring to the fore questions about planning, action, and critic, which must
be considered for future improvement. We hope this preliminary exploration will
inspire future research into the GUI agent community. All the test cases in the
paper can be tried through the project:
https://github.com/showlab/computer_use_ootb.Summary
AI-Generated Summary