AppAgentX：将GUI智能体进化为熟练的智能手机用户

摘要

近期，大型语言模型（LLMs）的进展催生了能够与图形用户界面（GUI）交互的智能LLM代理。这些代理展现出强大的推理能力和适应性，使其能够执行传统上依赖预定义规则的复杂任务。然而，LLM代理对逐步推理的依赖往往导致效率低下，尤其是在处理常规任务时。相比之下，传统的基于规则的系统在效率上表现优异，却缺乏适应新场景的智能与灵活性。为解决这一挑战，我们提出了一种新颖的GUI代理进化框架，旨在提升操作效率的同时保持智能与灵活性。我们的方法引入了一种记忆机制，记录代理的任务执行历史。通过分析这一历史，代理识别出重复的动作序列，并进化出高层动作作为快捷方式，替代这些低层操作，从而提高效率。这使得代理能够专注于需要更复杂推理的任务，同时简化常规操作。在多个基准任务上的实验结果表明，我们的方法在效率和准确性上均显著优于现有方法。代码将开源，以支持进一步研究。

English

Recent advancements in Large Language Models (LLMs) have led to the development of intelligent LLM-based agents capable of interacting with graphical user interfaces (GUIs). These agents demonstrate strong reasoning and adaptability, enabling them to perform complex tasks that traditionally required predefined rules. However, the reliance on step-by-step reasoning in LLM-based agents often results in inefficiencies, particularly for routine tasks. In contrast, traditional rule-based systems excel in efficiency but lack the intelligence and flexibility to adapt to novel scenarios. To address this challenge, we propose a novel evolutionary framework for GUI agents that enhances operational efficiency while retaining intelligence and flexibility. Our approach incorporates a memory mechanism that records the agent's task execution history. By analyzing this history, the agent identifies repetitive action sequences and evolves high-level actions that act as shortcuts, replacing these low-level operations and improving efficiency. This allows the agent to focus on tasks requiring more complex reasoning, while simplifying routine actions. Experimental results on multiple benchmark tasks demonstrate that our approach significantly outperforms existing methods in both efficiency and accuracy. The code will be open-sourced to support further research.

AppAgentX：将GUI智能体进化为熟练的智能手机用户

AppAgentX: Evolving GUI Agents as Proficient Smartphone Users

摘要

Summary

Support

Support