解读无模型强化学习中的涌现规划
Interpreting Emergent Planning in Model-Free Reinforcement Learning
April 2, 2025
作者: Thomas Bush, Stephen Chung, Usman Anwar, Adrià Garriga-Alonso, David Krueger
cs.AI
摘要
我们首次提供了机制性证据,表明无模型强化学习智能体能够学会规划。这一发现是通过将基于概念的可解释性方法应用于Sokoban——一个常用于研究规划的基准测试——中的无模型智能体实现的。具体而言,我们展示了由Guez等人(2019年)引入的通用无模型智能体DRC,利用学习到的概念表征在内部制定计划,这些计划不仅预测了行动对环境的长期影响,还影响了行动选择。我们的方法包括:(1)探测与规划相关的概念,(2)研究智能体表征中的计划形成过程,以及(3)通过干预验证发现的计划(在智能体表征中)对智能体行为具有因果效应。我们还展示了这些计划的出现与一种类似规划能力的涌现相吻合:即能够从额外的测试时间计算中获益。最后,我们对智能体学习到的规划算法进行了定性分析,发现其与并行化双向搜索具有高度相似性。我们的研究增进了对智能体规划行为内部机制的理解,鉴于近期大型语言模型(LLMs)通过强化学习展现出规划与推理能力的趋势,这一理解尤为重要。
English
We present the first mechanistic evidence that model-free reinforcement
learning agents can learn to plan. This is achieved by applying a methodology
based on concept-based interpretability to a model-free agent in Sokoban -- a
commonly used benchmark for studying planning. Specifically, we demonstrate
that DRC, a generic model-free agent introduced by Guez et al. (2019), uses
learned concept representations to internally formulate plans that both predict
the long-term effects of actions on the environment and influence action
selection. Our methodology involves: (1) probing for planning-relevant
concepts, (2) investigating plan formation within the agent's representations,
and (3) verifying that discovered plans (in the agent's representations) have a
causal effect on the agent's behavior through interventions. We also show that
the emergence of these plans coincides with the emergence of a planning-like
property: the ability to benefit from additional test-time compute. Finally, we
perform a qualitative analysis of the planning algorithm learned by the agent
and discover a strong resemblance to parallelized bidirectional search. Our
findings advance understanding of the internal mechanisms underlying planning
behavior in agents, which is important given the recent trend of emergent
planning and reasoning capabilities in LLMs through RLSummary
AI-Generated Summary