AndroidLab: Addestramento e Valutazione Sistematica di Agenti Autonomi Android

Abstract

Gli agenti autonomi sono diventati sempre più importanti per interagire con il mondo reale. Gli agenti Android, in particolare, sono stati di recente un metodo di interazione spesso menzionato. Tuttavia, gli studi esistenti per addestrare e valutare agenti Android mancano di una ricerca sistematica su modelli sia open-source che closed-source. In questo lavoro, proponiamo AndroidLab come un framework sistematico per agenti Android. Esso include un ambiente operativo con diverse modalità, spazio di azione e un benchmark riproducibile. Supporta sia grandi modelli linguistici (LLM) che modelli multimodali (LMM) nello stesso spazio di azione. Il benchmark di AndroidLab include dispositivi virtuali Android predefiniti e 138 compiti su nove app costruite su questi dispositivi. Utilizzando l'ambiente AndroidLab, sviluppiamo un dataset di istruzioni Android e addestriamo sei LLM e LMM open-source, aumentando i tassi di successo medi dal 4,59% al 21,50% per i LLM e dal 1,93% al 13,28% per i LMM. AndroidLab è open-source e disponibile pubblicamente su https://github.com/THUDM/Android-Lab.

English

Autonomous agents have become increasingly important for interacting with the real world. Android agents, in particular, have been recently a frequently-mentioned interaction method. However, existing studies for training and evaluating Android agents lack systematic research on both open-source and closed-source models. In this work, we propose AndroidLab as a systematic Android agent framework. It includes an operation environment with different modalities, action space, and a reproducible benchmark. It supports both large language models (LLMs) and multimodal models (LMMs) in the same action space. AndroidLab benchmark includes predefined Android virtual devices and 138 tasks across nine apps built on these devices. By using the AndroidLab environment, we develop an Android Instruction dataset and train six open-source LLMs and LMMs, lifting the average success rates from 4.59% to 21.50% for LLMs and from 1.93% to 13.28% for LMMs. AndroidLab is open-sourced and publicly available at https://github.com/THUDM/Android-Lab.