MINIMA:模态不变图像匹配
MINIMA: Modality Invariant Image Matching
December 27, 2024
作者: Xingyu Jiang, Jiangwei Ren, Zizhuo Li, Xin Zhou, Dingkang Liang, Xiang Bai
cs.AI
摘要
跨视图和跨模态的图像匹配在多模态感知中发挥着关键作用。在实践中,由不同成像系统/风格引起的模态差距给匹配任务带来了巨大挑战。现有研究试图提取特定模态的不变特征,并在有限数据集上进行训练,但表现出较差的泛化能力。本文提出了MINIMA,一个适用于多种跨模态情况的统一图像匹配框架。MINIMA旨在从数据扩展的角度提升通用性能,而非追求花哨的模块。为此,我们提出了一个简单而有效的数据引擎,可以自由生成包含多种模态、丰富场景和准确匹配标签的大型数据集。具体而言,我们通过生成模型将模态从仅包含丰富RGB匹配数据的便宜数据扩展。在这种设置下,匹配标签和RGB数据集的丰富多样性被生成的多模态数据很好地继承。借助于此,我们构建了MD-syn,一个填补了通用多模态图像匹配数据空白的新综合数据集。利用MD-syn,我们可以直接在随机选择的模态对上训练任何先进的匹配流水线,以获得跨模态能力。在域内和零样本匹配任务上进行了大量实验,包括19种跨模态情况,结果表明我们的MINIMA可以显著优于基线方法,甚至超越特定模态的方法。数据集和代码可在 https://github.com/LSXI7/MINIMA 获取。
English
Image matching for both cross-view and cross-modality plays a critical role
in multimodal perception. In practice, the modality gap caused by different
imaging systems/styles poses great challenges to the matching task. Existing
works try to extract invariant features for specific modalities and train on
limited datasets, showing poor generalization. In this paper, we present
MINIMA, a unified image matching framework for multiple cross-modal cases.
Without pursuing fancy modules, our MINIMA aims to enhance universal
performance from the perspective of data scaling up. For such purpose, we
propose a simple yet effective data engine that can freely produce a large
dataset containing multiple modalities, rich scenarios, and accurate matching
labels. Specifically, we scale up the modalities from cheap but rich RGB-only
matching data, by means of generative models. Under this setting, the matching
labels and rich diversity of the RGB dataset are well inherited by the
generated multimodal data. Benefiting from this, we construct MD-syn, a new
comprehensive dataset that fills the data gap for general multimodal image
matching. With MD-syn, we can directly train any advanced matching pipeline on
randomly selected modality pairs to obtain cross-modal ability. Extensive
experiments on in-domain and zero-shot matching tasks, including 19
cross-modal cases, demonstrate that our MINIMA can significantly outperform the
baselines and even surpass modality-specific methods. The dataset and code are
available at https://github.com/LSXI7/MINIMA .Summary
AI-Generated Summary