QLASS:通过Q引导的逐步搜索增强语言代理推理
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
February 4, 2025
作者: Zongyu Lin, Yao Tang, Xingcheng Yao, Da Yin, Ziniu Hu, Yizhou Sun, Kai-Wei Chang
cs.AI
摘要
语言代理已成为复杂交互任务的一种有前途的解决方案。语言代理成功的关键因素之一是代理工作流轨迹上的奖励模型,该模型在训练或推理过程中提供有价值的指导。然而,由于中间交互的缺乏注释,大多数现有作品使用结果奖励模型来优化整个轨迹上的策略。这可能导致次优策略并阻碍整体性能。为了解决这个问题,我们提出了QLASS(Q引导的语言代理逐步搜索),通过逐步估计Q值为开放语言代理自动生成注释。通过引入推理树和执行过程奖励建模,QLASS为每个步骤提供了有效的中间指导。借助逐步指导,我们提出了一种Q引导的生成策略,使语言代理能够更好地适应长期价值,从而在复杂交互代理任务的模型推理过程中实现显著性能改进。值得注意的是,即使使用了几乎一半的注释数据,QLASS仍保持强大的性能,展示了其在处理有限监督方面的效率。我们还通过定性分析实证证明了QLASS可以导致更有效的决策制定。我们将发布我们的代码和数据。
English
Language agents have become a promising solution to complex interactive
tasks. One of the key ingredients to the success of language agents is the
reward model on the trajectory of the agentic workflow, which provides valuable
guidance during training or inference. However, due to the lack of annotations
of intermediate interactions, most existing works use an outcome reward model
to optimize policies across entire trajectories. This may lead to sub-optimal
policies and hinder the overall performance. To address this, we propose QLASS
(Q-guided Language Agent Stepwise Search), to automatically generate
annotations by estimating Q-values in a stepwise manner for open language
agents. By introducing a reasoning tree and performing process reward modeling,
QLASS provides effective intermediate guidance for each step. With the stepwise
guidance, we propose a Q-guided generation strategy to enable language agents
to better adapt to long-term value, resulting in significant performance
improvement during model inference on complex interactive agent tasks. Notably,
even with almost half the annotated data, QLASS retains strong performance,
demonstrating its efficiency in handling limited supervision. We also
empirically demonstrate that QLASS can lead to more effective decision making
through qualitative analysis. We will release our code and data.Summary
AI-Generated Summary