CityDreamer4D：无限维4D城市的组合生成模型

摘要

近年来，3D场景生成引起了越来越多的关注并取得了显著进展。生成4D城市比3D场景更具挑战性，因为存在结构复杂、视觉多样的对象，如建筑和车辆，并且人类对城市环境中的扭曲更加敏感。为了解决这些问题，我们提出了CityDreamer4D，这是一种专门为生成无边界4D城市而设计的组合生成模型。我们的主要见解是：1）4D城市生成应该将动态对象（例如车辆）与静态场景（例如建筑和道路）分开；2）4D场景中的所有对象应由不同类型的神经场组成，用于建筑物、车辆和背景物体。具体而言，我们提出了交通场景生成器和无边界布局生成器，使用高度紧凑的BEV表示来生成动态交通场景和静态城市布局。4D城市中的对象是通过将针对背景物体、建筑物和车辆的面向物体和面向实例的神经场相结合来生成的。为了适应背景物体和实例的不同特征，神经场采用定制的生成哈希网格和周期性位置嵌入作为场景参数化。此外，我们提供了一套全面的用于城市生成的数据集，包括OSM、GoogleEarth和CityTopia。OSM数据集提供了各种真实世界的城市布局，而Google Earth和CityTopia数据集提供了大规模、高质量的城市图像，包括3D实例注释。借助其组合设计，CityDreamer4D支持一系列下游应用，如实例编辑、城市风格化和城市模拟，同时在生成逼真的4D城市方面表现出色。

English

3D scene generation has garnered growing attention in recent years and has made significant progress. Generating 4D cities is more challenging than 3D scenes due to the presence of structurally complex, visually diverse objects like buildings and vehicles, and heightened human sensitivity to distortions in urban environments. To tackle these issues, we propose CityDreamer4D, a compositional generative model specifically tailored for generating unbounded 4D cities. Our main insights are 1) 4D city generation should separate dynamic objects (e.g., vehicles) from static scenes (e.g., buildings and roads), and 2) all objects in the 4D scene should be composed of different types of neural fields for buildings, vehicles, and background stuff. Specifically, we propose Traffic Scenario Generator and Unbounded Layout Generator to produce dynamic traffic scenarios and static city layouts using a highly compact BEV representation. Objects in 4D cities are generated by combining stuff-oriented and instance-oriented neural fields for background stuff, buildings, and vehicles. To suit the distinct characteristics of background stuff and instances, the neural fields employ customized generative hash grids and periodic positional embeddings as scene parameterizations. Furthermore, we offer a comprehensive suite of datasets for city generation, including OSM, GoogleEarth, and CityTopia. The OSM dataset provides a variety of real-world city layouts, while the Google Earth and CityTopia datasets deliver large-scale, high-quality city imagery complete with 3D instance annotations. Leveraging its compositional design, CityDreamer4D supports a range of downstream applications, such as instance editing, city stylization, and urban simulation, while delivering state-of-the-art performance in generating realistic 4D cities.

CityDreamer4D：无限维4D城市的组合生成模型

CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities

摘要

Summary

Support