다이나소어: 미리 정의된 동작을 넘어서는 대규모 언어 에이전트

초록

기존 LLM 에이전트 시스템은 일반적으로 각 단계에서 고정되고 미리 정의된 집합에서 작업을 선택합니다. 이 접근 방식은 폐쇄된, 좁은 범위의 환경에서 효과적이지만, 우리는 실제 시나리오에서 LLM 에이전트를 배치할 때 두 가지 주요 도전 과제가 있다고 주장합니다: (1) 고정된 작업 집합에서 선택하는 것은 LLM 에이전트의 계획 및 작용 능력을 심각하게 제한하며, (2) 이 접근 방식은 모든 가능한 작업을 나열하고 구현하는 데 상당한 인적 노력이 필요하며, 잠재적인 작업이 많은 복잡한 환경에서는 실용적이지 않습니다. 본 연구에서는 온라인 방식으로 작업을 동적으로 생성하고 구성할 수 있는 LLM 에이전트 프레임워크를 제안합니다. 이 프레임워크에서 에이전트는 각 단계에서 일반 목적 프로그래밍 언어로 작성된 프로그램을 생성하고 실행하여 환경과 상호 작용합니다. 또한 생성된 작업은 나중에 재사용할 수 있도록 시간이 지남에 따라 누적됩니다. 우리의 GAIA 벤치마크에서의 광범위한 실험 결과는 이 프레임워크가 이전 방법보다 훨씬 더 큰 유연성을 제공하며 우수한 성능을 발휘한다는 것을 보여줍니다. 특히, 미리 정의된 집합에서 관련 작업이 없는 시나리오나 기존 작업이 예기치 못한 예외 상황으로 실패한 경우에 LLM 에이전트가 회복할 수 있도록 합니다. 작성 시점에서 우리는 GAIA 공개 리더보드에서 최상위 위치를 차지하고 있습니다. 우리의 코드는 https://github.com/adobe-research/dynasaur{https://github.com/adobe-research/dynasaur}에서 찾을 수 있습니다.

English

Existing LLM agent systems typically select actions from a fixed and predefined set at every step. While this approach is effective in closed, narrowly-scoped environments, we argue that it presents two major challenges when deploying LLM agents in real-world scenarios: (1) selecting from a fixed set of actions significantly restricts the planning and acting capabilities of LLM agents, and (2) this approach requires substantial human effort to enumerate and implement all possible actions, which becomes impractical in complex environments with a vast number of potential actions. In this work, we propose an LLM agent framework that enables the dynamic creation and composition of actions in an online manner. In this framework, the agent interacts with the environment by generating and executing programs written in a general-purpose programming language at each step. Furthermore, generated actions are accumulated over time for future reuse. Our extensive experiments on the GAIA benchmark demonstrate that this framework offers significantly greater flexibility and outperforms previous methods. Notably, it allows an LLM agent to recover in scenarios where no relevant action exists in the predefined set or when existing actions fail due to unforeseen edge cases. At the time of writing, we hold the top position on the GAIA public leaderboard. Our code can be found in https://github.com/adobe-research/dynasaur{https://github.com/adobe-research/dynasaur}.

다이나소어: 미리 정의된 동작을 넘어서는 대규모 언어 에이전트

DynaSaur: Large Language Agents Beyond Predefined Actions

초록

Summary

Support