ChatPaper.aiChatPaper

MonoPlace3D:學習3D感知的物體放置技術用於單目3D檢測

MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection

April 9, 2025
作者: Rishubh Parihar, Srinjay Sarkar, Sarthak Vora, Jogendra Kundu, R. Venkatesh Babu
cs.AI

摘要

現有的單目3D檢測器受限於現實世界數據集的有限多樣性和規模。雖然數據增強確實有所幫助,但在戶外環境中生成具有場景感知的真實增強數據尤其困難。目前大多數合成數據生成方法通過改進渲染技術來專注於物體外觀的真實性。然而,我們發現物體的位置和擺放方式對於訓練有效的單目3D檢測器同樣至關重要。關鍵的挑戰在於自動確定真實的物體放置參數——包括位置、尺寸和方向對齊——當將合成物體引入實際場景時。為解決這一問題,我們引入了MonoPlace3D,這是一個考慮3D場景內容以創建真實增強的新系統。具體而言,給定一個背景場景,MonoPlace3D學習一個關於合理3D邊界框的分佈。隨後,我們渲染真實的物體,並根據從學習到的分佈中採樣的位置進行放置。我們在KITTI和NuScenes兩個標準數據集上的全面評估表明,MonoPlace3D顯著提高了多種現有單目3D檢測器的準確性,同時具有高度的數據效率。
English
Current monocular 3D detectors are held back by the limited diversity and scale of real-world datasets. While data augmentation certainly helps, it's particularly difficult to generate realistic scene-aware augmented data for outdoor settings. Most current approaches to synthetic data generation focus on realistic object appearance through improved rendering techniques. However, we show that where and how objects are positioned is just as crucial for training effective 3D monocular detectors. The key obstacle lies in automatically determining realistic object placement parameters - including position, dimensions, and directional alignment when introducing synthetic objects into actual scenes. To address this, we introduce MonoPlace3D, a novel system that considers the 3D scene content to create realistic augmentations. Specifically, given a background scene, MonoPlace3D learns a distribution over plausible 3D bounding boxes. Subsequently, we render realistic objects and place them according to the locations sampled from the learned distribution. Our comprehensive evaluation on two standard datasets KITTI and NuScenes, demonstrates that MonoPlace3D significantly improves the accuracy of multiple existing monocular 3D detectors while being highly data efficient.

Summary

AI-Generated Summary

PDF32April 11, 2025