MatchAnything:通用跨模態圖像匹配與大規模預訓練
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training
January 13, 2025
作者: Xingyi He, Hao Yu, Sida Peng, Dongli Tan, Zehong Shen, Hujun Bao, Xiaowei Zhou
cs.AI
摘要
影像匹配旨在識別不同影像之間對應的像素位置,在各種科學領域中至關重要,有助於影像配准、融合和分析。近年來,基於深度學習的影像匹配算法在快速準確地找到大量對應方面明顯優於人類。然而,當處理由於不同成像模式導致外觀變化明顯的影像時,這些算法的性能常常下降,原因在於跨模態訓練數據稀缺。這一限制阻礙了依賴多種影像模態獲取補充信息的各個領域的應用。為應對這一挑戰,我們提出了一個大規模預訓練框架,利用合成的跨模態訓練信號,整合來自不同來源的多樣數據,訓練模型識別和匹配影像之間的基本結構。這種能力可轉移到現實世界中看不見的跨模態影像匹配任務。我們的主要發現是,使用我們框架訓練的匹配模型在超過八個看不見的跨模態配准任務中實現了顯著的泛化能力,使用相同的網絡權重,明顯優於現有方法,無論是為泛化而設計還是針對特定任務而量身定制。這一進步顯著增強了影像匹配技術在各種科學領域的應用性,為多模態人類和人工智能分析等新應用打開了道路。
English
Image matching, which aims to identify corresponding pixel locations between
images, is crucial in a wide range of scientific disciplines, aiding in image
registration, fusion, and analysis. In recent years, deep learning-based image
matching algorithms have dramatically outperformed humans in rapidly and
accurately finding large amounts of correspondences. However, when dealing with
images captured under different imaging modalities that result in significant
appearance changes, the performance of these algorithms often deteriorates due
to the scarcity of annotated cross-modal training data. This limitation hinders
applications in various fields that rely on multiple image modalities to obtain
complementary information. To address this challenge, we propose a large-scale
pre-training framework that utilizes synthetic cross-modal training signals,
incorporating diverse data from various sources, to train models to recognize
and match fundamental structures across images. This capability is transferable
to real-world, unseen cross-modality image matching tasks. Our key finding is
that the matching model trained with our framework achieves remarkable
generalizability across more than eight unseen cross-modality registration
tasks using the same network weight, substantially outperforming existing
methods, whether designed for generalization or tailored for specific tasks.
This advancement significantly enhances the applicability of image matching
technologies across various scientific disciplines and paves the way for new
applications in multi-modality human and artificial intelligence analysis and
beyond.Summary
AI-Generated Summary