WORLDMEM:基于记忆的长期一致性世界模拟
WORLDMEM: Long-term Consistent World Simulation with Memory
April 16, 2025
作者: Zeqi Xiao, Yushi Lan, Yifan Zhou, Wenqi Ouyang, Shuai Yang, Yanhong Zeng, Xingang Pan
cs.AI
摘要
世界模拟因其能够建模虚拟环境并预测行为后果而日益受到关注。然而,有限的时间上下文窗口常常导致在维持长期一致性方面出现失败,特别是在保持三维空间一致性时。在本研究中,我们提出了WorldMem框架,该框架通过一个由存储记忆帧和状态(如姿态和时间戳)的记忆单元组成的内存库来增强场景生成。通过采用一种基于记忆帧状态有效提取相关信息的记忆注意力机制,我们的方法能够准确重建先前观察到的场景,即使存在显著的视角或时间间隔。此外,通过将时间戳纳入状态,我们的框架不仅模拟了一个静态世界,还捕捉了其随时间的动态演变,从而在模拟世界中实现了感知与交互。在虚拟和真实场景中的大量实验验证了我们方法的有效性。
English
World simulation has gained increasing popularity due to its ability to model
virtual environments and predict the consequences of actions. However, the
limited temporal context window often leads to failures in maintaining
long-term consistency, particularly in preserving 3D spatial consistency. In
this work, we present WorldMem, a framework that enhances scene generation with
a memory bank consisting of memory units that store memory frames and states
(e.g., poses and timestamps). By employing a memory attention mechanism that
effectively extracts relevant information from these memory frames based on
their states, our method is capable of accurately reconstructing previously
observed scenes, even under significant viewpoint or temporal gaps.
Furthermore, by incorporating timestamps into the states, our framework not
only models a static world but also captures its dynamic evolution over time,
enabling both perception and interaction within the simulated world. Extensive
experiments in both virtual and real scenarios validate the effectiveness of
our approach.Summary
AI-Generated Summary