交互学习:一种面向数据的框架,用于适应真实环境中的自适应智能体
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments
January 18, 2025
作者: Hongjin Su, Ruoxi Sun, Jinsung Yoon, Pengcheng Yin, Tao Yu, Sercan Ö. Arık
cs.AI
摘要
由大型语言模型(LLMs)驱动的自主代理具有增强人类能力的潜力,可以协助完成从发送电子邮件到执行数据分析等数字任务。现有LLMs在这些任务上的能力通常受制于它们与相应环境的高质量代理数据的缺乏。我们提出了“互动学习”(Learn-by-interact)的数据中心框架,用于使LLM代理适应任何给定环境,而无需人类注释。互动学习基于文档合成代理-环境交互轨迹,并通过总结或抽象交互历史构建指令,这一过程称为反向构建。我们通过在基于训练的场景和基于训练的无上下文学习(ICL)中使用合成数据来评估其质量,在其中我们为代理设计了针对性的创新检索方法。跨越现实编码、Web和桌面环境的SWE-bench、WebArena、OSWorld和Spider2-V上的大量实验显示了互动学习在各种下游代理任务中的有效性——与Claude-3.5相比,ICL的基准结果提高了高达12.2%,而与Codestral-22B一起进行的训练提高了19.5%。我们进一步展示了反向构建的关键作用,为训练提供了高达14.0%的改进。我们的消融研究表明了我们合成数据在ICL中提供的效率,以及我们的检索管道相对于传统的检索增强生成(RAG)等替代方法的优越性。我们期望互动学习将作为代理数据合成的基础,随着LLMs越来越多地部署在真实世界环境中。
English
Autonomous agents powered by large language models (LLMs) have the potential
to enhance human capabilities, assisting with digital tasks from sending emails
to performing data analysis. The abilities of existing LLMs at such tasks are
often hindered by the lack of high-quality agent data from the corresponding
environments they interact with. We propose Learn-by-interact, a data-centric
framework to adapt LLM agents to any given environments without human
annotations. Learn-by-interact synthesizes trajectories of agent-environment
interactions based on documentations, and constructs instructions by
summarizing or abstracting the interaction histories, a process called backward
construction. We assess the quality of our synthetic data by using them in both
training-based scenarios and training-free in-context learning (ICL), where we
craft innovative retrieval approaches optimized for agents. Extensive
experiments on SWE-bench, WebArena, OSWorld and Spider2-V spanning across
realistic coding, web, and desktop environments show the effectiveness of
Learn-by-interact in various downstream agentic tasks -- baseline results are
improved by up to 12.2\% for ICL with Claude-3.5 and 19.5\% for training with
Codestral-22B. We further demonstrate the critical role of backward
construction, which provides up to 14.0\% improvement for training. Our
ablation studies demonstrate the efficiency provided by our synthesized data in
ICL and the superiority of our retrieval pipeline over alternative approaches
like conventional retrieval-augmented generation (RAG). We expect that
Learn-by-interact will serve as a foundation for agent data synthesis as LLMs
are increasingly deployed at real-world environments.Summary
AI-Generated Summary