ChatPaper.aiChatPaper

RIG:端到端通用策略中推理与想象的协同融合

RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy

March 31, 2025
作者: Zhonghan Zhao, Wenwei Zhang, Haian Huang, Kuikun Liu, Jianfei Gao, Gaoang Wang, Kai Chen
cs.AI

摘要

在复杂开放世界环境中运行的具身智能体,其行动前的推理与潜在结果的想象(即世界模型)至关重要。然而,先前的研究要么仅在端到端智能体中融入其中一种能力,要么将多个专门模型集成到智能体系统中,这限制了策略的学习效率和泛化能力。因此,本文首次尝试在端到端的通用策略中协同推理与想象,称之为RIG。为了以端到端方式训练RIG,我们构建了一个数据管道,逐步整合并丰富从现有智能体收集的轨迹中的想象与推理内容。推理与下一帧图像生成的联合学习,显式地建模了推理、行动与环境动态之间的内在关联,从而相较于以往工作,展现出超过17倍的样本效率提升和更好的泛化能力。在推理阶段,RIG首先推理下一步行动,生成潜在动作,随后预测动作结果,这为智能体提供了在采取实际行动前基于想象进行审视和自我纠正的机会。实验结果表明,推理与想象的协同不仅增强了通用策略的鲁棒性、泛化能力和互操作性,还实现了测试时的扩展,从而提升了整体性能。
English
Reasoning before action and imagining potential outcomes (i.e., world models) are essential for embodied agents operating in complex open-world environments. Yet, prior work either incorporates only one of these abilities in an end-to-end agent or integrates multiple specialized models into an agent system, limiting the learning efficiency and generalization of the policy. Thus, this paper makes the first attempt to synergize Reasoning and Imagination in an end-to-end Generalist policy, termed RIG. To train RIG in an end-to-end manner, we construct a data pipeline that progressively integrates and enriches the content of imagination and reasoning in the trajectories collected from existing agents. The joint learning of reasoning and next image generation explicitly models the inherent correlation between reasoning, action, and dynamics of environments, and thus exhibits more than 17times sample efficiency improvements and generalization in comparison with previous works. During inference, RIG first reasons about the next action, produces potential action, and then predicts the action outcomes, which offers the agent a chance to review and self-correct based on the imagination before taking real actions. Experimental results show that the synergy of reasoning and imagination not only improves the robustness, generalization, and interoperability of generalist policy but also enables test-time scaling to enhance overall performance.

Summary

AI-Generated Summary

PDF302April 1, 2025