MatchAnything：具有大规模预训练的通用跨模态图像匹配

摘要

图像匹配旨在识别图像之间对应的像素位置，在广泛的科学领域中至关重要，有助于图像配准、融合和分析。近年来，基于深度学习的图像匹配算法在快速准确地找到大量对应方面明显优于人类。然而，当处理由不同成像模式捕获的图像，导致外观变化显著时，由于缺乏带标注的跨模态训练数据，这些算法的性能通常会下降。这一限制阻碍了依赖多种图像模态获取互补信息的各个领域的应用。为了解决这一挑战，我们提出了一个大规模预训练框架，利用合成的跨模态训练信号，整合来自不同来源的多样数据，训练模型识别和匹配图像之间的基本结构。这种能力可迁移到真实世界中未见的跨模态图像匹配任务。我们的关键发现是，使用我们框架训练的匹配模型在超过八个未见跨模态配准任务中具有显著的泛化能力，使用相同的网络权重，远远优于现有方法，无论是为泛化而设计还是为特定任务量身定制。这一进展显著增强了图像匹配技术在各种科学领域中的适用性，并为在多模态人类和人工智能分析等领域开展新应用铺平了道路。

English

Image matching, which aims to identify corresponding pixel locations between images, is crucial in a wide range of scientific disciplines, aiding in image registration, fusion, and analysis. In recent years, deep learning-based image matching algorithms have dramatically outperformed humans in rapidly and accurately finding large amounts of correspondences. However, when dealing with images captured under different imaging modalities that result in significant appearance changes, the performance of these algorithms often deteriorates due to the scarcity of annotated cross-modal training data. This limitation hinders applications in various fields that rely on multiple image modalities to obtain complementary information. To address this challenge, we propose a large-scale pre-training framework that utilizes synthetic cross-modal training signals, incorporating diverse data from various sources, to train models to recognize and match fundamental structures across images. This capability is transferable to real-world, unseen cross-modality image matching tasks. Our key finding is that the matching model trained with our framework achieves remarkable generalizability across more than eight unseen cross-modality registration tasks using the same network weight, substantially outperforming existing methods, whether designed for generalization or tailored for specific tasks. This advancement significantly enhances the applicability of image matching technologies across various scientific disciplines and paves the way for new applications in multi-modality human and artificial intelligence analysis and beyond.

MatchAnything：具有大规模预训练的通用跨模态图像匹配

MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training

摘要

Summary

Support