具有持久状态的连续3D感知模型
Continuous 3D Perception Model with Persistent State
January 21, 2025
作者: Qianqian Wang, Yifei Zhang, Aleksander Holynski, Alexei A. Efros, Angjoo Kanazawa
cs.AI
摘要
我们提出了一个统一的框架,能够解决广泛的3D任务。我们的方法采用了一个具有状态的循环模型,可以持续地更新其状态表示以适应每个新观测。给定一系列图像,这个不断演化的状态可以用来以在线方式为每个新输入生成度量尺度的点地图(每像素3D点)。这些点地图位于一个共同的坐标系内,并且可以累积成一个连贯、密集的场景重建,随着新图像的到来而更新。我们的模型名为CUT3R(Continuous Updating Transformer for 3D Reconstruction),捕捉了真实世界场景丰富的先验知识:它不仅可以从图像观测中预测准确的点地图,还可以通过探测虚拟的、未观测视角来推断场景中未见的区域。我们的方法简单而高度灵活,自然地接受可能是视频流或无序照片集合的不同长度的图像,这些图像包含静态和动态内容。我们在各种3D/4D任务上评估了我们的方法,并在每个任务中展示了具有竞争力或最先进的性能。项目页面:https://cut3r.github.io/
English
We present a unified framework capable of solving a broad range of 3D tasks.
Our approach features a stateful recurrent model that continuously updates its
state representation with each new observation. Given a stream of images, this
evolving state can be used to generate metric-scale pointmaps (per-pixel 3D
points) for each new input in an online fashion. These pointmaps reside within
a common coordinate system, and can be accumulated into a coherent, dense scene
reconstruction that updates as new images arrive. Our model, called CUT3R
(Continuous Updating Transformer for 3D Reconstruction), captures rich priors
of real-world scenes: not only can it predict accurate pointmaps from image
observations, but it can also infer unseen regions of the scene by probing at
virtual, unobserved views. Our method is simple yet highly flexible, naturally
accepting varying lengths of images that may be either video streams or
unordered photo collections, containing both static and dynamic content. We
evaluate our method on various 3D/4D tasks and demonstrate competitive or
state-of-the-art performance in each. Project Page: https://cut3r.github.io/Summary
AI-Generated Summary