掩码场景建模:缩小监督学习与自监督学习在三维场景理解中的差距
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
April 9, 2025
作者: Pedro Hermosilla, Christian Stippel, Leon Sick
cs.AI
摘要
自监督学习通过使模型能够在大量未标注数据集上进行训练,从而提供与有标签训练模型性能相当的通用现成特征,彻底改变了二维计算机视觉领域。然而,在三维场景理解中,自监督方法通常仅作为任务特定微调的权重初始化步骤,限制了其在通用特征提取方面的应用。本文针对这一不足,提出了一种专门设计的鲁棒评估协议,旨在评估自监督特征在三维场景理解中的质量。我们的协议采用分层模型的多分辨率特征采样,创建丰富的点级表示,这些表示能够捕捉模型的语义能力,因此适用于线性探测和最近邻方法的评估。此外,我们首次引入了一种自监督模型,在仅使用现成特征的线性探测设置下,其表现与监督模型相当。特别地,我们的模型在三维空间中以原生方式训练,采用了一种基于掩码场景建模目标的新型自监督方法,该方法自下而上地重建掩码补丁的深层特征,并专门针对分层三维模型进行了定制。我们的实验不仅证明了该方法在性能上与监督模型相当,而且大幅超越了现有的自监督方法。模型及训练代码可在我们的Github仓库中找到(https://github.com/phermosilla/msm)。
English
Self-supervised learning has transformed 2D computer vision by enabling
models trained on large, unannotated datasets to provide versatile
off-the-shelf features that perform similarly to models trained with labels.
However, in 3D scene understanding, self-supervised methods are typically only
used as a weight initialization step for task-specific fine-tuning, limiting
their utility for general-purpose feature extraction. This paper addresses this
shortcoming by proposing a robust evaluation protocol specifically designed to
assess the quality of self-supervised features for 3D scene understanding. Our
protocol uses multi-resolution feature sampling of hierarchical models to
create rich point-level representations that capture the semantic capabilities
of the model and, hence, are suitable for evaluation with linear probing and
nearest-neighbor methods. Furthermore, we introduce the first self-supervised
model that performs similarly to supervised models when only off-the-shelf
features are used in a linear probing setup. In particular, our model is
trained natively in 3D with a novel self-supervised approach based on a Masked
Scene Modeling objective, which reconstructs deep features of masked patches in
a bottom-up manner and is specifically tailored to hierarchical 3D models. Our
experiments not only demonstrate that our method achieves competitive
performance to supervised models, but also surpasses existing self-supervised
approaches by a large margin. The model and training code can be found at our
Github repository (https://github.com/phermosilla/msm).Summary
AI-Generated Summary