WALL-E 2.0:基於神經符號學習的世界對齊提升世界模型驅動的大型語言模型代理
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
April 22, 2025
作者: Siyu Zhou, Tianyi Zhou, Yijun Yang, Guodong Long, Deheng Ye, Jing Jiang, Chengqi Zhang
cs.AI
摘要
我們能否基於大型語言模型(LLMs)構建精確的世界模型?世界模型如何助力LLM代理?LLMs的先驗知識與特定環境動態之間的差距,通常會限制其作為世界模型的表現。為彌合這一差距,我們提出了一種無需訓練的「世界對齊」方法,該方法學習與LLMs互補的環境符號知識。這些符號知識涵蓋了動作規則、知識圖譜和場景圖,它們由LLMs從探索軌跡中提取,並編碼為可執行代碼,以規範LLM代理的策略。我們進一步提出了一種無需強化學習、基於模型的代理「WALL-E 2.0」,通過模型預測控制(MPC)框架實現。與傳統MPC需要在線進行昂貴優化不同,我們採用LLM代理作為未來步驟動作的高效前瞻優化器,與神經符號世界模型交互。雖然LLM代理的強大啟發式能力使其成為MPC中的高效規劃者,但其規劃動作的質量也由對齊世界模型的精確預測所保障。它們共同顯著提升了在新環境中的學習效率。在火星(類似Minecraft)和ALFWorld(具身室內環境)的開放世界挑戰中,WALL-E 2.0顯著超越了現有方法,例如在火星上成功率超出基線16.1%-51.6%,得分至少高出61.7%。在ALFWorld中,僅經過4次迭代,它就達到了98%成功率的新紀錄。
English
Can we build accurate world models out of large language models (LLMs)? How
can world models benefit LLM agents? The gap between the prior knowledge of
LLMs and the specified environment's dynamics usually bottlenecks LLMs'
performance as world models. To bridge the gap, we propose a training-free
"world alignment" that learns an environment's symbolic knowledge complementary
to LLMs. The symbolic knowledge covers action rules, knowledge graphs, and
scene graphs, which are extracted by LLMs from exploration trajectories and
encoded into executable codes to regulate LLM agents' policies. We further
propose an RL-free, model-based agent "WALL-E 2.0" through the model-predictive
control (MPC) framework. Unlike classical MPC requiring costly optimization on
the fly, we adopt an LLM agent as an efficient look-ahead optimizer of future
steps' actions by interacting with the neurosymbolic world model. While the LLM
agent's strong heuristics make it an efficient planner in MPC, the quality of
its planned actions is also secured by the accurate predictions of the aligned
world model. They together considerably improve learning efficiency in a new
environment. On open-world challenges in Mars (Minecraft like) and ALFWorld
(embodied indoor environments), WALL-E 2.0 significantly outperforms existing
methods, e.g., surpassing baselines in Mars by 16.1%-51.6% of success rate and
by at least 61.7% in score. In ALFWorld, it achieves a new record 98% success
rate after only 4 iterations.Summary
AI-Generated Summary