GUI代理程式的曙光：與Claude 3.5電腦的初步案例研究

摘要

最近发布的模型，Claude 3.5 Computer Use，在首個前沿AI模型中以圖形使用者介面（GUI）代理程式的形式提供電腦使用的公開測試版。作為早期測試版，其在實際複雜環境中的能力尚不明確。在這個探索Claude 3.5 Computer Use的案例研究中，我們匯編和組織了一系列精心設計的任務，涵蓋各種領域和軟體。從這些案例中觀察到，Claude 3.5 Computer Use在端對端語言至桌面操作方面展現了前所未有的能力。除了這項研究，我們還提供了一個即用型代理程式框架，用於部署基於API的GUI自動化模型，實現輕鬆的實作。我們的案例研究旨在展示Claude 3.5 Computer Use的能力和限制基礎，並進行詳細分析，提出關於規劃、行動和評論的問題，這些問題必須考慮以進行未來改進。我們希望這項初步探索能激發對GUI代理程式社群的未來研究。本文中的所有測試案例都可以通過該專案進行嘗試：https://github.com/showlab/computer_use_ootb。

English

The recently released model, Claude 3.5 Computer Use, stands out as the first frontier AI model to offer computer use in public beta as a graphical user interface (GUI) agent. As an early beta, its capability in the real-world complex environment remains unknown. In this case study to explore Claude 3.5 Computer Use, we curate and organize a collection of carefully designed tasks spanning a variety of domains and software. Observations from these cases demonstrate Claude 3.5 Computer Use's unprecedented ability in end-to-end language to desktop actions. Along with this study, we provide an out-of-the-box agent framework for deploying API-based GUI automation models with easy implementation. Our case studies aim to showcase a groundwork of capabilities and limitations of Claude 3.5 Computer Use with detailed analyses and bring to the fore questions about planning, action, and critic, which must be considered for future improvement. We hope this preliminary exploration will inspire future research into the GUI agent community. All the test cases in the paper can be tried through the project: https://github.com/showlab/computer_use_ootb.

GUI代理程式的曙光：與Claude 3.5電腦的初步案例研究

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

摘要

Support