AppAgentX:将GUI智能体进化为熟练的智能手机用户
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
March 4, 2025
作者: Wenjia Jiang, Yangyang Zhuang, Chenxi Song, Xu Yang, Chi Zhang
cs.AI
摘要
近期,大型语言模型(LLMs)的进展催生了能够与图形用户界面(GUI)交互的智能LLM代理。这些代理展现出强大的推理能力和适应性,使其能够执行传统上依赖预定义规则的复杂任务。然而,LLM代理对逐步推理的依赖往往导致效率低下,尤其是在处理常规任务时。相比之下,传统的基于规则的系统在效率上表现优异,却缺乏适应新场景的智能与灵活性。为解决这一挑战,我们提出了一种新颖的GUI代理进化框架,旨在提升操作效率的同时保持智能与灵活性。我们的方法引入了一种记忆机制,记录代理的任务执行历史。通过分析这一历史,代理识别出重复的动作序列,并进化出高层动作作为快捷方式,替代这些低层操作,从而提高效率。这使得代理能够专注于需要更复杂推理的任务,同时简化常规操作。在多个基准任务上的实验结果表明,我们的方法在效率和准确性上均显著优于现有方法。代码将开源,以支持进一步研究。
English
Recent advancements in Large Language Models (LLMs) have led to the
development of intelligent LLM-based agents capable of interacting with
graphical user interfaces (GUIs). These agents demonstrate strong reasoning and
adaptability, enabling them to perform complex tasks that traditionally
required predefined rules. However, the reliance on step-by-step reasoning in
LLM-based agents often results in inefficiencies, particularly for routine
tasks. In contrast, traditional rule-based systems excel in efficiency but lack
the intelligence and flexibility to adapt to novel scenarios. To address this
challenge, we propose a novel evolutionary framework for GUI agents that
enhances operational efficiency while retaining intelligence and flexibility.
Our approach incorporates a memory mechanism that records the agent's task
execution history. By analyzing this history, the agent identifies repetitive
action sequences and evolves high-level actions that act as shortcuts,
replacing these low-level operations and improving efficiency. This allows the
agent to focus on tasks requiring more complex reasoning, while simplifying
routine actions. Experimental results on multiple benchmark tasks demonstrate
that our approach significantly outperforms existing methods in both efficiency
and accuracy. The code will be open-sourced to support further research.Summary
AI-Generated Summary