智能体模型:将行动链生成内化至推理模型
Agent models: Internalizing Chain-of-Action Generation into Reasoning models
March 9, 2025
作者: Yuxiang Zhang, Yuqi Yang, Jiangming Shu, Xinyan Wen, Jitao Sang
cs.AI
摘要
传统的智能体工作流程依赖于外部提示来管理与工具和环境的交互,这限制了推理模型的自主性。我们提出大型智能体模型(LAMs),其内化生成动作链(CoA),使模型能够自主决定何时以及如何使用外部工具。我们提出的AutoCoA框架结合了监督微调(SFT)和强化学习(RL),使模型能够在推理与行动之间无缝切换,同时高效管理环境交互。主要组件包括步骤级动作触发、轨迹级CoA优化以及一个内部世界模型,以降低实际环境交互成本。在开放域问答任务上的评估表明,经过AutoCoA训练的智能体模型在任务完成度上显著优于基于ReAct的工作流程,尤其是在需要长期推理和多步动作的任务中。代码和数据集可在https://github.com/ADaM-BJTU/AutoCoA获取。
English
Traditional agentic workflows rely on external prompts to manage interactions
with tools and the environment, which limits the autonomy of reasoning models.
We position Large Agent Models (LAMs) that internalize the generation of
Chain-of-Action (CoA), enabling the model to autonomously decide when
and how to use external tools. Our proposed AutoCoA framework combines
supervised fine-tuning (SFT) and reinforcement learning (RL), allowing the
model to seamlessly switch between reasoning and action while efficiently
managing environment interactions. Main components include step-level action
triggering, trajectory-level CoA optimization, and an internal world model to
reduce real-environment interaction costs. Evaluations on open-domain QA tasks
demonstrate that AutoCoA-trained agent models significantly outperform
ReAct-based workflows in task completion, especially in tasks that require
long-term reasoning and multi-step actions. Code and dataset are available at
https://github.com/ADaM-BJTU/AutoCoASummary
AI-Generated Summary