ARKit標籤生成器:室內3D場景理解的新尺度
ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding
October 17, 2024
作者: Guangda Ji, Silvan Weder, Francis Engelmann, Marc Pollefeys, Hermann Blum
cs.AI
摘要
神經網絡的性能隨著其規模和訓練數據量的增加而提升。這一點在語言和圖像生成中得到了證實。然而,這需要具有規模友好的網絡架構以及大規模數據集。儘管像變壓器這樣的規模友好的架構已經出現用於3D視覺任務,但由於缺乏訓練數據,3D視覺的GPT時刻仍然遙不可及。在本文中,我們介紹了ARKit LabelMaker,這是第一個具有密集語義標註的大規模現實世界3D數據集。具體來說,我們通過在規模上自動生成的密集語義標註來補充ARKitScenes數據集。為此,我們擴展了LabelMaker,這是一個最近的自動標註流程,以滿足大規模預訓練的需求。這包括擴展流程以整合尖端分割模型,並使其能夠應對大規模處理的挑戰。此外,我們通過使用主流3D語義分割模型在ScanNet和ScanNet200數據集上推進了最新技術的性能,展示了我們生成的數據集的有效性。
English
The performance of neural networks scales with both their size and the amount
of data they have been trained on. This is shown in both language and image
generation. However, this requires scaling-friendly network architectures as
well as large-scale datasets. Even though scaling-friendly architectures like
transformers have emerged for 3D vision tasks, the GPT-moment of 3D vision
remains distant due to the lack of training data. In this paper, we introduce
ARKit LabelMaker, the first large-scale, real-world 3D dataset with dense
semantic annotations. Specifically, we complement ARKitScenes dataset with
dense semantic annotations that are automatically generated at scale. To this
end, we extend LabelMaker, a recent automatic annotation pipeline, to serve the
needs of large-scale pre-training. This involves extending the pipeline with
cutting-edge segmentation models as well as making it robust to the challenges
of large-scale processing. Further, we push forward the state-of-the-art
performance on ScanNet and ScanNet200 dataset with prevalent 3D semantic
segmentation models, demonstrating the efficacy of our generated dataset.Summary
AI-Generated Summary