交互学习：一种面向数据的框架，用于适应真实环境中的自适应智能体

摘要

由大型语言模型（LLMs）驱动的自主代理具有增强人类能力的潜力，可以协助完成从发送电子邮件到执行数据分析等数字任务。现有LLMs在这些任务上的能力通常受制于它们与相应环境的高质量代理数据的缺乏。我们提出了“互动学习”（Learn-by-interact）的数据中心框架，用于使LLM代理适应任何给定环境，而无需人类注释。互动学习基于文档合成代理-环境交互轨迹，并通过总结或抽象交互历史构建指令，这一过程称为反向构建。我们通过在基于训练的场景和基于训练的无上下文学习（ICL）中使用合成数据来评估其质量，在其中我们为代理设计了针对性的创新检索方法。跨越现实编码、Web和桌面环境的SWE-bench、WebArena、OSWorld和Spider2-V上的大量实验显示了互动学习在各种下游代理任务中的有效性——与Claude-3.5相比，ICL的基准结果提高了高达12.2％，而与Codestral-22B一起进行的训练提高了19.5％。我们进一步展示了反向构建的关键作用，为训练提供了高达14.0％的改进。我们的消融研究表明了我们合成数据在ICL中提供的效率，以及我们的检索管道相对于传统的检索增强生成（RAG）等替代方法的优越性。我们期望互动学习将作为代理数据合成的基础，随着LLMs越来越多地部署在真实世界环境中。

English

Autonomous agents powered by large language models (LLMs) have the potential to enhance human capabilities, assisting with digital tasks from sending emails to performing data analysis. The abilities of existing LLMs at such tasks are often hindered by the lack of high-quality agent data from the corresponding environments they interact with. We propose Learn-by-interact, a data-centric framework to adapt LLM agents to any given environments without human annotations. Learn-by-interact synthesizes trajectories of agent-environment interactions based on documentations, and constructs instructions by summarizing or abstracting the interaction histories, a process called backward construction. We assess the quality of our synthetic data by using them in both training-based scenarios and training-free in-context learning (ICL), where we craft innovative retrieval approaches optimized for agents. Extensive experiments on SWE-bench, WebArena, OSWorld and Spider2-V spanning across realistic coding, web, and desktop environments show the effectiveness of Learn-by-interact in various downstream agentic tasks -- baseline results are improved by up to 12.2\% for ICL with Claude-3.5 and 19.5\% for training with Codestral-22B. We further demonstrate the critical role of backward construction, which provides up to 14.0\% improvement for training. Our ablation studies demonstrate the efficiency provided by our synthesized data in ICL and the superiority of our retrieval pipeline over alternative approaches like conventional retrieval-augmented generation (RAG). We expect that Learn-by-interact will serve as a foundation for agent data synthesis as LLMs are increasingly deployed at real-world environments.

交互学习：一种面向数据的框架，用于适应真实环境中的自适应智能体

Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments

摘要

Summary

Support

Support