WALL-E 2.0:通过神经符号学习实现世界对齐,提升基于世界模型的大语言模型智能体
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
April 22, 2025
作者: Siyu Zhou, Tianyi Zhou, Yijun Yang, Guodong Long, Deheng Ye, Jing Jiang, Chengqi Zhang
cs.AI
摘要
我们能否基于大型语言模型(LLMs)构建精确的世界模型?世界模型如何助力LLM智能体?LLMs的先验知识与特定环境动态之间的差距,往往成为其作为世界模型性能提升的瓶颈。为弥合这一差距,我们提出了一种无需训练的“世界对齐”方法,该方法学习与LLMs互补的环境符号知识。这些符号知识涵盖动作规则、知识图谱及场景图,由LLMs从探索轨迹中提取,并编码为可执行代码,用以规范LLM智能体的策略。进一步,我们通过模型预测控制(MPC)框架,提出了一种无需强化学习的模型驱动智能体“WALL-E 2.0”。与经典MPC需在线进行高成本优化不同,我们采用LLM智能体作为未来步骤动作的高效前瞻优化器,通过与神经符号世界模型交互实现。LLM智能体强大的启发式能力使其在MPC中成为高效规划者,而其规划动作的质量也由对齐世界模型的精准预测所保障。二者共同显著提升了在新环境中的学习效率。在火星(类似Minecraft)和ALFWorld(具身室内环境)的开放世界挑战中,WALL-E 2.0显著超越现有方法,例如在火星任务中成功率超出基线16.1%-51.6%,得分至少提高61.7%。在ALFWorld中,仅经过4次迭代便创下98%成功率的新纪录。
English
Can we build accurate world models out of large language models (LLMs)? How
can world models benefit LLM agents? The gap between the prior knowledge of
LLMs and the specified environment's dynamics usually bottlenecks LLMs'
performance as world models. To bridge the gap, we propose a training-free
"world alignment" that learns an environment's symbolic knowledge complementary
to LLMs. The symbolic knowledge covers action rules, knowledge graphs, and
scene graphs, which are extracted by LLMs from exploration trajectories and
encoded into executable codes to regulate LLM agents' policies. We further
propose an RL-free, model-based agent "WALL-E 2.0" through the model-predictive
control (MPC) framework. Unlike classical MPC requiring costly optimization on
the fly, we adopt an LLM agent as an efficient look-ahead optimizer of future
steps' actions by interacting with the neurosymbolic world model. While the LLM
agent's strong heuristics make it an efficient planner in MPC, the quality of
its planned actions is also secured by the accurate predictions of the aligned
world model. They together considerably improve learning efficiency in a new
environment. On open-world challenges in Mars (Minecraft like) and ALFWorld
(embodied indoor environments), WALL-E 2.0 significantly outperforms existing
methods, e.g., surpassing baselines in Mars by 16.1%-51.6% of success rate and
by at least 61.7% in score. In ALFWorld, it achieves a new record 98% success
rate after only 4 iterations.Summary
AI-Generated Summary