GUI 에이전트의 태동: Claude 3.5 컴퓨터를 활용한 예비 사례 연구

초록

최근 출시된 모델인 Claude 3.5 컴퓨터 사용은 그래픽 사용자 인터페이스(GUI) 에이전트로서 일반 베타로 컴퓨터 사용을 제공하는 최초의 프론티어 AI 모델로 주목받고 있습니다. 초기 베타로서, 이 모델의 실제 복잡한 환경에서의 능력은 알려지지 않았습니다. Claude 3.5 컴퓨터 사용을 탐구하기 위한 이 사례 연구에서는 다양한 도메인과 소프트웨어를 아우르는 신중히 설계된 작업 모음을 편집하고 조직합니다. 이러한 사례에서의 관측 결과는 Claude 3.5 컴퓨터 사용이 언어에서 데스크톱 작업까지의 전체 과정에서 전례없는 능력을 보여주고 있음을 입증합니다. 본 연구와 함께, API 기반 GUI 자동화 모델을 쉽게 구현할 수 있는 즉시 사용 가능한 에이전트 프레임워크를 제공합니다. 우리의 사례 연구는 Claude 3.5 컴퓨터 사용의 능력과 한계를 상세히 분석하여 계획, 행동 및 비평에 대한 문제를 제기하며, 향후 개선을 위해 고려해야 할 사항을 강조하고 있습니다. 이러한 초기 탐구가 GUI 에이전트 커뮤니티에 대한 미래 연구를 촉진할 것으로 기대합니다. 논문의 모든 테스트 케이스는 다음 프로젝트를 통해 시도해 볼 수 있습니다: https://github.com/showlab/computer_use_ootb.

English

The recently released model, Claude 3.5 Computer Use, stands out as the first frontier AI model to offer computer use in public beta as a graphical user interface (GUI) agent. As an early beta, its capability in the real-world complex environment remains unknown. In this case study to explore Claude 3.5 Computer Use, we curate and organize a collection of carefully designed tasks spanning a variety of domains and software. Observations from these cases demonstrate Claude 3.5 Computer Use's unprecedented ability in end-to-end language to desktop actions. Along with this study, we provide an out-of-the-box agent framework for deploying API-based GUI automation models with easy implementation. Our case studies aim to showcase a groundwork of capabilities and limitations of Claude 3.5 Computer Use with detailed analyses and bring to the fore questions about planning, action, and critic, which must be considered for future improvement. We hope this preliminary exploration will inspire future research into the GUI agent community. All the test cases in the paper can be tried through the project: https://github.com/showlab/computer_use_ootb.

GUI 에이전트의 태동: Claude 3.5 컴퓨터를 활용한 예비 사례 연구

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

초록

Summary

Support