メガペア：普遍的なマルチモーダル検索のための大規模データ統合

要旨

マルチモーダル検索の需要が急速に拡大しているにもかかわらず、この分野の進展は訓練データの不足によって厳しく制約されています。本稿では、ビジョン言語モデル（VLMs）とオープンドメイン画像を活用した新しいデータ合成手法であるMegaPairsを紹介します。この手法から生成された大規模な合成データセットを用いて、MegaPairsが高品質のデータを生成し、既存のデータセットからの70倍のデータで訓練されたベースラインモデルを大幅に上回るマルチモーダル検索器を可能にすることを経験的に分析しました。さらに、MegaPairsは一般的な画像コーパスとオープンソースのVLMsにのみ依存しているため、簡単にスケーリングでき、検索パフォーマンスの持続的な改善を実現します。この段階では、このデータを使用して26百万以上の訓練インスタンスを生成し、さまざまなサイズの複数のモデルを訓練しました。これらの新しいモデルは、4つの一般的な合成画像検索（CIR）ベンチマークとMMEBによって提供された36のデータセット全体で最先端のゼロショットパフォーマンスを達成し、追加のダウンストリームファインチューニングによる顕著なパフォーマンス向上も示しています。私たちが提供するデータセット、十分に訓練されたモデル、およびデータ合成パイプラインは、この分野の将来の発展を促進するために公開されます。

English

Despite the rapidly growing demand for multimodal retrieval, progress in this field remains severely constrained by a lack of training data. In this paper, we introduce MegaPairs, a novel data synthesis method that leverages vision language models (VLMs) and open-domain images, together with a massive synthetic dataset generated from this method. Our empirical analysis shows that MegaPairs generates high-quality data, enabling the multimodal retriever to significantly outperform the baseline model trained on 70times more data from existing datasets. Moreover, since MegaPairs solely relies on general image corpora and open-source VLMs, it can be easily scaled up, enabling continuous improvements in retrieval performance. In this stage, we produced more than 26 million training instances and trained several models of varying sizes using this data. These new models achieve state-of-the-art zero-shot performance across 4 popular composed image retrieval (CIR) benchmarks and the highest overall performance on the 36 datasets provided by MMEB. They also demonstrate notable performance improvements with additional downstream fine-tuning. Our produced dataset, well-trained models, and data synthesis pipeline will be made publicly available to facilitate the future development of this field.

メガペア：普遍的なマルチモーダル検索のための大規模データ統合

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

要旨

Summary

Support

Support