MINIMA:模態不變圖像匹配
MINIMA: Modality Invariant Image Matching
December 27, 2024
作者: Xingyu Jiang, Jiangwei Ren, Zizhuo Li, Xin Zhou, Dingkang Liang, Xiang Bai
cs.AI
摘要
在多模態感知中,跨視角和跨模態的影像匹配扮演著至關重要的角色。在實踐中,由不同成像系統/風格引起的模態差異對匹配任務構成了巨大挑戰。現有作品試圖為特定模態提取不變特徵並在有限數據集上進行訓練,但顯示出較差的泛化能力。在本文中,我們提出了MINIMA,一個針對多種跨模態情況的統一影像匹配框架。我們的MINIMA旨在從數據擴展的角度提升通用性能,而非追求花俏的模組。為此,我們提出了一個簡單而有效的數據引擎,可以自由生成包含多種模態、豐富場景和準確匹配標籤的大型數據集。具體而言,我們通過生成模型將模態從僅包含豐富RGB匹配數據的便宜數據擴展,從而繼承了RGB數據集的匹配標籤和豐富多樣性。借助這一點,我們構建了MD-syn,一個填補一般多模態影像匹配數據差距的新綜合數據集。通過MD-syn,我們可以直接在隨機選擇的模態對上訓練任何先進的匹配管道,以獲得跨模態能力。在域內和零樣本匹配任務上進行了大量實驗,包括19個跨模態案例,結果表明我們的MINIMA可以顯著優於基線甚至超越特定模態的方法。數據集和代碼可在 https://github.com/LSXI7/MINIMA 找到。
English
Image matching for both cross-view and cross-modality plays a critical role
in multimodal perception. In practice, the modality gap caused by different
imaging systems/styles poses great challenges to the matching task. Existing
works try to extract invariant features for specific modalities and train on
limited datasets, showing poor generalization. In this paper, we present
MINIMA, a unified image matching framework for multiple cross-modal cases.
Without pursuing fancy modules, our MINIMA aims to enhance universal
performance from the perspective of data scaling up. For such purpose, we
propose a simple yet effective data engine that can freely produce a large
dataset containing multiple modalities, rich scenarios, and accurate matching
labels. Specifically, we scale up the modalities from cheap but rich RGB-only
matching data, by means of generative models. Under this setting, the matching
labels and rich diversity of the RGB dataset are well inherited by the
generated multimodal data. Benefiting from this, we construct MD-syn, a new
comprehensive dataset that fills the data gap for general multimodal image
matching. With MD-syn, we can directly train any advanced matching pipeline on
randomly selected modality pairs to obtain cross-modal ability. Extensive
experiments on in-domain and zero-shot matching tasks, including 19
cross-modal cases, demonstrate that our MINIMA can significantly outperform the
baselines and even surpass modality-specific methods. The dataset and code are
available at https://github.com/LSXI7/MINIMA .Summary
AI-Generated Summary