MIT-10M:一個大規模的多語言圖像翻譯平行語料庫

MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation

December 10, 2024
作者: Bo Li, Shaolin Zhu, Lijie Wen
cs.AI

摘要

影像翻譯(IT)在各個領域具有巨大潛力,能夠將圖像中的文本內容翻譯成各種語言。然而,現有數據集往往存在規模、多樣性和質量方面的限制,阻礙了IT模型的發展和評估。為解決這一問題,我們引入了MIT-10M,這是一個大規模的多語言影像翻譯平行語料庫,包含超過1000萬個源自真實世界數據的影像文本對,經過了大量的數據清理和多語言翻譯驗證。其中包含了三種尺寸的840K張圖像,28個類別的任務,三個難度級別以及14種語言的影像文本對,這在現有數據集的基礎上有了顯著的改進。我們進行了大量實驗來評估和訓練MIT-10M上的模型。實驗結果清楚地表明,我們的數據集在評估模型應對現實世界中具有挑戰性和複雜的影像翻譯任務的表現時具有更高的適應性。此外,使用MIT-10M進行微調的模型性能相比基準模型提高了三倍,進一步確認了其優越性。
English
Image Translation (IT) holds immense potential across diverse domains, enabling the translation of textual content within images into various languages. However, existing datasets often suffer from limitations in scale, diversity, and quality, hindering the development and evaluation of IT models. To address this issue, we introduce MIT-10M, a large-scale parallel corpus of multilingual image translation with over 10M image-text pairs derived from real-world data, which has undergone extensive data cleaning and multilingual translation validation. It contains 840K images in three sizes, 28 categories, tasks with three levels of difficulty and 14 languages image-text pairs, which is a considerable improvement on existing datasets. We conduct extensive experiments to evaluate and train models on MIT-10M. The experimental results clearly indicate that our dataset has higher adaptability when it comes to evaluating the performance of the models in tackling challenging and complex image translation tasks in the real world. Moreover, the performance of the model fine-tuned with MIT-10M has tripled compared to the baseline model, further confirming its superiority.

Summary

AI-Generated Summary

PDF52December 12, 2024