AndroidLab: 안드로이드 자율 에이전트의 훈련과 체계적인 벤치마킹

초록

자율 에이전트는 현실 세계와 상호 작용하는 데 점점 더 중요해지고 있습니다. 특히 안드로이드 에이전트는 최근 자주 언급되는 상호 작용 방법 중 하나입니다. 그러나 기존의 안드로이드 에이전트를 훈련하고 평가하는 연구는 오픈 소스와 폐쇄 소스 모두에 대한 체계적인 연구가 부족합니다. 본 연구에서는 AndroidLab을 체계적인 안드로이드 에이전트 프레임워크로 제안합니다. 이는 다양한 모드, 액션 공간 및 재현 가능한 벤치마크를 갖춘 작동 환경을 포함합니다. 동일한 액션 공간에서 대형 언어 모델 (LLM) 및 다중 모달 모델 (LMM)을 지원합니다. AndroidLab 벤치마크에는 미리 정의된 안드로이드 가상 장치와 이러한 장치에 구축된 9개 앱을 통해 138가지 작업이 포함되어 있습니다. AndroidLab 환경을 활용하여 Android 지침 데이터 세트를 개발하고 6개의 오픈 소스 LLM 및 LMM을 훈련함으로써 LLM의 평균 성공률을 4.59%에서 21.50%로, LMM의 평균 성공률을 1.93%에서 13.28%로 향상시켰습니다. AndroidLab은 오픈 소스로 공개되어 있으며 https://github.com/THUDM/Android-Lab에서 공개되어 있습니다.

English

Autonomous agents have become increasingly important for interacting with the real world. Android agents, in particular, have been recently a frequently-mentioned interaction method. However, existing studies for training and evaluating Android agents lack systematic research on both open-source and closed-source models. In this work, we propose AndroidLab as a systematic Android agent framework. It includes an operation environment with different modalities, action space, and a reproducible benchmark. It supports both large language models (LLMs) and multimodal models (LMMs) in the same action space. AndroidLab benchmark includes predefined Android virtual devices and 138 tasks across nine apps built on these devices. By using the AndroidLab environment, we develop an Android Instruction dataset and train six open-source LLMs and LMMs, lifting the average success rates from 4.59% to 21.50% for LLMs and from 1.93% to 13.28% for LMMs. AndroidLab is open-sourced and publicly available at https://github.com/THUDM/Android-Lab.

AndroidLab: 안드로이드 자율 에이전트의 훈련과 체계적인 벤치마킹

AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

초록

Support