ATLaS:通过关键步骤学习进行智能体调优
ATLaS: Agent Tuning via Learning Critical Steps
March 4, 2025
作者: Zhixun Chen, Ming Li, Yuxuan Huang, Yali Du, Meng Fang, Tianyi Zhou
cs.AI
摘要
大型语言模型(LLM)代理在多领域任务中展现出了卓越的泛化能力。现有的代理调优方法通常采用对整个专家轨迹进行监督微调。然而,完整轨迹的行为克隆可能会引入专家偏见,并削弱对专家数据未覆盖状态的泛化能力。此外,规划、中间子任务的复杂推理以及战略决策等关键步骤对于代理任务的成功至关重要,因此学习这些步骤是提升LLM代理性能的关键。为了实现更高效、更经济的代理调优,我们提出了ATLaS方法,该方法识别专家轨迹中的关键步骤,并仅在这些步骤上以较低成本微调LLM。通过将训练重点集中在少数关键步骤上,我们的方法降低了过拟合整个轨迹的风险,并促进了在不同环境和任务间的泛化能力。在大量实验中,仅使用ATLaS选取的30%关键步骤微调的LLM,其表现优于在所有步骤上微调的LLM以及近期开源的LLM代理。ATLaS不仅保持了基础LLM作为通用代理与多样化环境交互的技能,还进一步提升了这些技能。
English
Large Language Model (LLM) agents have demonstrated remarkable generalization
capabilities across multi-domain tasks. Existing agent tuning approaches
typically employ supervised finetuning on entire expert trajectories. However,
behavior-cloning of full trajectories can introduce expert bias and weaken
generalization to states not covered by the expert data. Additionally, critical
steps, such as planning, complex reasoning for intermediate subtasks, and
strategic decision-making, are essential to success in agent tasks, so learning
these steps is the key to improving LLM agents. For more effective and
efficient agent tuning, we propose ATLaS that identifies the critical steps in
expert trajectories and finetunes LLMs solely on these steps with reduced
costs. By steering the training's focus to a few critical steps, our method
mitigates the risk of overfitting entire trajectories and promotes
generalization across different environments and tasks. In extensive
experiments, an LLM finetuned on only 30% critical steps selected by ATLaS
outperforms the LLM finetuned on all steps and recent open-source LLM agents.
ATLaS maintains and improves base LLM skills as generalist agents interacting
with diverse environments.Summary
AI-Generated Summary