DynaSaur:超越预定义动作的大型语言代理
DynaSaur: Large Language Agents Beyond Predefined Actions
November 4, 2024
作者: Dang Nguyen, Viet Dac Lai, Seunghyun Yoon, Ryan A. Rossi, Handong Zhao, Ruiyi Zhang, Puneet Mathur, Nedim Lipka, Yu Wang, Trung Bui, Franck Dernoncourt, Tianyi Zhou
cs.AI
摘要
现有的LLM代理系统通常在每一步从一个固定和预定义的集合中选择动作。虽然这种方法在封闭、范围狭窄的环境中是有效的,但我们认为在实际场景中部署LLM代理时会出现两个主要挑战:(1) 从固定动作集合中选择显著限制了LLM代理的规划和执行能力,(2) 这种方法需要大量人力来枚举和实现所有可能的动作,在具有大量潜在动作的复杂环境中变得不切实际。在这项工作中,我们提出了一个LLM代理框架,可以动态地在线创建和组合动作。在这个框架中,代理通过在每一步生成和执行用通用编程语言编写的程序与环境进行交互。此外,生成的动作会随着时间累积以供将来重复使用。我们在GAIA基准测试上进行了大量实验,证明了这个框架提供了显著更大的灵活性,并且胜过了先前的方法。值得注意的是,它使LLM代理能够在预定义集合中不存在相关动作或现有动作由于意外边缘情况而失败的情况下恢复。在撰写本文时,我们在GAIA公共排行榜上名列前茅。我们的代码可以在https://github.com/adobe-research/dynasaur{https://github.com/adobe-research/dynasaur}找到。
English
Existing LLM agent systems typically select actions from a fixed and
predefined set at every step. While this approach is effective in closed,
narrowly-scoped environments, we argue that it presents two major challenges
when deploying LLM agents in real-world scenarios: (1) selecting from a fixed
set of actions significantly restricts the planning and acting capabilities of
LLM agents, and (2) this approach requires substantial human effort to
enumerate and implement all possible actions, which becomes impractical in
complex environments with a vast number of potential actions. In this work, we
propose an LLM agent framework that enables the dynamic creation and
composition of actions in an online manner. In this framework, the agent
interacts with the environment by generating and executing programs written in
a general-purpose programming language at each step. Furthermore, generated
actions are accumulated over time for future reuse. Our extensive experiments
on the GAIA benchmark demonstrate that this framework offers significantly
greater flexibility and outperforms previous methods. Notably, it allows an LLM
agent to recover in scenarios where no relevant action exists in the predefined
set or when existing actions fail due to unforeseen edge cases. At the time of
writing, we hold the top position on the GAIA public leaderboard. Our code can
be found in
https://github.com/adobe-research/dynasaur{https://github.com/adobe-research/dynasaur}.Summary
AI-Generated Summary