MonoPlace3D:面向单目3D检测的3D感知物体放置学习
MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection
April 9, 2025
作者: Rishubh Parihar, Srinjay Sarkar, Sarthak Vora, Jogendra Kundu, R. Venkatesh Babu
cs.AI
摘要
当前的单目3D检测器受限于现实世界数据集的多样性和规模不足。虽然数据增强确实有所帮助,但在户外场景中生成具有真实场景感知的增强数据尤为困难。目前大多数合成数据生成方法通过改进渲染技术来关注物体的真实外观。然而,我们证明,在训练有效的单目3D检测器时,物体的位置和摆放方式同样至关重要。关键挑战在于自动确定真实物体放置参数——包括位置、尺寸和方向对齐——当将合成物体引入实际场景时。为此,我们提出了MonoPlace3D,一个考虑3D场景内容以创建真实增强的新颖系统。具体而言,给定一个背景场景,MonoPlace3D学习一个关于合理3D边界框的分布。随后,我们渲染真实物体,并根据从学习到的分布中采样的位置进行放置。我们在KITTI和NuScenes两个标准数据集上的全面评估表明,MonoPlace3D显著提高了多种现有单目3D检测器的准确性,同时具有极高的数据效率。
English
Current monocular 3D detectors are held back by the limited diversity and
scale of real-world datasets. While data augmentation certainly helps, it's
particularly difficult to generate realistic scene-aware augmented data for
outdoor settings. Most current approaches to synthetic data generation focus on
realistic object appearance through improved rendering techniques. However, we
show that where and how objects are positioned is just as crucial for training
effective 3D monocular detectors. The key obstacle lies in automatically
determining realistic object placement parameters - including position,
dimensions, and directional alignment when introducing synthetic objects into
actual scenes. To address this, we introduce MonoPlace3D, a novel system that
considers the 3D scene content to create realistic augmentations. Specifically,
given a background scene, MonoPlace3D learns a distribution over plausible 3D
bounding boxes. Subsequently, we render realistic objects and place them
according to the locations sampled from the learned distribution. Our
comprehensive evaluation on two standard datasets KITTI and NuScenes,
demonstrates that MonoPlace3D significantly improves the accuracy of multiple
existing monocular 3D detectors while being highly data efficient.Summary
AI-Generated Summary