遮蔽場景建模:縮小監督學習與自監督學習在三維場景理解中的差距
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
April 9, 2025
作者: Pedro Hermosilla, Christian Stippel, Leon Sick
cs.AI
摘要
自監督學習已徹底改變了二維計算機視覺領域,它使模型能夠在大量未標註數據集上進行訓練,從而提供多功能的現成特徵,其表現與使用標籤訓練的模型相當。然而,在三維場景理解中,自監督方法通常僅作為任務特定微調的權重初始化步驟,這限制了它們在通用特徵提取中的效用。本文針對這一不足,提出了一種專門設計的穩健評估協議,用於評估自監督特徵在三維場景理解中的質量。我們的協議利用分層模型的多分辨率特徵採樣,創建豐富的點級表示,這些表示捕捉了模型的語義能力,因此適合使用線性探測和最近鄰方法進行評估。此外,我們引入了首個自監督模型,在僅使用現成特徵的線性探測設置中,其表現與監督模型相當。特別是,我們的模型在三維中進行原生訓練,採用了一種基於掩碼場景建模目標的新穎自監督方法,該方法以自下而上的方式重建掩碼補丁的深度特徵,並專門針對分層三維模型進行了定制。我們的實驗不僅展示了我們的方法在性能上與監督模型競爭,而且還大幅超越了現有的自監督方法。模型和訓練代碼可在我們的Github倉庫中找到(https://github.com/phermosilla/msm)。
English
Self-supervised learning has transformed 2D computer vision by enabling
models trained on large, unannotated datasets to provide versatile
off-the-shelf features that perform similarly to models trained with labels.
However, in 3D scene understanding, self-supervised methods are typically only
used as a weight initialization step for task-specific fine-tuning, limiting
their utility for general-purpose feature extraction. This paper addresses this
shortcoming by proposing a robust evaluation protocol specifically designed to
assess the quality of self-supervised features for 3D scene understanding. Our
protocol uses multi-resolution feature sampling of hierarchical models to
create rich point-level representations that capture the semantic capabilities
of the model and, hence, are suitable for evaluation with linear probing and
nearest-neighbor methods. Furthermore, we introduce the first self-supervised
model that performs similarly to supervised models when only off-the-shelf
features are used in a linear probing setup. In particular, our model is
trained natively in 3D with a novel self-supervised approach based on a Masked
Scene Modeling objective, which reconstructs deep features of masked patches in
a bottom-up manner and is specifically tailored to hierarchical 3D models. Our
experiments not only demonstrate that our method achieves competitive
performance to supervised models, but also surpasses existing self-supervised
approaches by a large margin. The model and training code can be found at our
Github repository (https://github.com/phermosilla/msm).Summary
AI-Generated Summary