DynaSaur：超越预定义动作的大型语言代理

摘要

现有的LLM代理系统通常在每一步从一个固定和预定义的集合中选择动作。虽然这种方法在封闭、范围狭窄的环境中是有效的，但我们认为在实际场景中部署LLM代理时会出现两个主要挑战：(1) 从固定动作集合中选择显著限制了LLM代理的规划和执行能力，(2) 这种方法需要大量人力来枚举和实现所有可能的动作，在具有大量潜在动作的复杂环境中变得不切实际。在这项工作中，我们提出了一个LLM代理框架，可以动态地在线创建和组合动作。在这个框架中，代理通过在每一步生成和执行用通用编程语言编写的程序与环境进行交互。此外，生成的动作会随着时间累积以供将来重复使用。我们在GAIA基准测试上进行了大量实验，证明了这个框架提供了显著更大的灵活性，并且胜过了先前的方法。值得注意的是，它使LLM代理能够在预定义集合中不存在相关动作或现有动作由于意外边缘情况而失败的情况下恢复。在撰写本文时，我们在GAIA公共排行榜上名列前茅。我们的代码可以在https://github.com/adobe-research/dynasaur{https://github.com/adobe-research/dynasaur}找到。

English

Existing LLM agent systems typically select actions from a fixed and predefined set at every step. While this approach is effective in closed, narrowly-scoped environments, we argue that it presents two major challenges when deploying LLM agents in real-world scenarios: (1) selecting from a fixed set of actions significantly restricts the planning and acting capabilities of LLM agents, and (2) this approach requires substantial human effort to enumerate and implement all possible actions, which becomes impractical in complex environments with a vast number of potential actions. In this work, we propose an LLM agent framework that enables the dynamic creation and composition of actions in an online manner. In this framework, the agent interacts with the environment by generating and executing programs written in a general-purpose programming language at each step. Furthermore, generated actions are accumulated over time for future reuse. Our extensive experiments on the GAIA benchmark demonstrate that this framework offers significantly greater flexibility and outperforms previous methods. Notably, it allows an LLM agent to recover in scenarios where no relevant action exists in the predefined set or when existing actions fail due to unforeseen edge cases. At the time of writing, we hold the top position on the GAIA public leaderboard. Our code can be found in https://github.com/adobe-research/dynasaur{https://github.com/adobe-research/dynasaur}.

DynaSaur：超越预定义动作的大型语言代理

DynaSaur: Large Language Agents Beyond Predefined Actions

摘要

Summary

Support

Support