ChatPaper.aiChatPaper

MobA:一個用於高效行動任務自動化的雙層代理系統

MobA: A Two-Level Agent System for Efficient Mobile Task Automation

October 17, 2024
作者: Zichen Zhu, Hao Tang, Yansi Li, Kunyao Lan, Yixuan Jiang, Hao Zhou, Yixiao Wang, Situo Zhang, Liangtai Sun, Lu Chen, Kai Yu
cs.AI

摘要

目前的手機助理受限於對系統 API 的依賴,或者因理解和決策能力有限而難以應對複雜的使用者指令和多樣的界面。為了應對這些挑戰,我們提出了 MobA,一個由多模式大型語言模型驅動的新型手機代理,通過精密的雙層代理架構增強了理解和規劃能力。高層全局代理(GA)負責理解使用者指令、追蹤歷史記憶和規劃任務。低層本地代理(LA)預測以函數調用形式的詳細操作,受 GA 的子任務和記憶引導。整合反射模組可實現高效的任務完成,使系統能夠處理以前未見過的複雜任務。MobA 在現實評估中展示了任務執行效率和完成率的顯著提升,突顯了以 MLLM 為動力的手機助理的潛力。
English
Current mobile assistants are limited by dependence on system APIs or struggle with complex user instructions and diverse interfaces due to restricted comprehension and decision-making abilities. To address these challenges, we propose MobA, a novel Mobile phone Agent powered by multimodal large language models that enhances comprehension and planning capabilities through a sophisticated two-level agent architecture. The high-level Global Agent (GA) is responsible for understanding user commands, tracking history memories, and planning tasks. The low-level Local Agent (LA) predicts detailed actions in the form of function calls, guided by sub-tasks and memory from the GA. Integrating a Reflection Module allows for efficient task completion and enables the system to handle previously unseen complex tasks. MobA demonstrates significant improvements in task execution efficiency and completion rate in real-life evaluations, underscoring the potential of MLLM-empowered mobile assistants.

Summary

AI-Generated Summary

PDF333November 16, 2024