ChatPaper.aiChatPaper

MVL-SIB:面向跨模态主题匹配的大规模多语言视觉-语言基准测试

MVL-SIB: A Massively Multilingual Vision-Language Benchmark for Cross-Modal Topical Matching

February 18, 2025
作者: Fabian David Schmidt, Florian Schneider, Chris Biemann, Goran Glavaš
cs.AI

摘要

现有的多语言视觉-语言(VL)基准通常仅涵盖少数几种语言。因此,对大型视觉-语言模型(LVLMs)的评估主要针对高资源语言,这凸显了对低资源语言评估数据的需求。为应对这一局限,我们引入了MVL-SIB,一个大规模多语言视觉-语言基准,它评估了205种语言中的跨模态和纯文本主题匹配——比现有最广泛的多语言VL基准多出100多种语言。随后,我们在一系列开放权重的LVLMs以及GPT-4o(迷你版)上对MVL-SIB进行了基准测试。结果显示,LVLMs在低资源语言的跨模态主题匹配上表现欠佳,对于如N'Koo这样的语言,其表现甚至不优于随机猜测。通过比较跨模态与纯文本主题匹配的表现,我们的分析进一步揭示,在低资源语言中,LVLMs的视觉-语言支持相对于文本支持呈不成比例下降。此外,我们观察到,开放权重的LVLMs并未因使用多张图像表示同一主题而获益,这表明这些模型在处理多图像任务方面尚未完全有效。通过将MVL-SIB上的表现与其他多语言VL基准相关联,我们强调MVL-SIB作为全面探测LVLMs多语言视觉-语言理解能力的工具。
English
Existing multilingual vision-language (VL) benchmarks often only cover a handful of languages. Consequently, evaluations of large vision-language models (LVLMs) predominantly target high-resource languages, underscoring the need for evaluation data for low-resource languages. To address this limitation, we introduce MVL-SIB, a massively multilingual vision-language benchmark that evaluates both cross-modal and text-only topical matching across 205 languages -- over 100 more than the most multilingual existing VL benchmarks encompass. We then benchmark a range of of open-weight LVLMs together with GPT-4o(-mini) on MVL-SIB. Our results reveal that LVLMs struggle in cross-modal topic matching in lower-resource languages, performing no better than chance on languages like N'Koo. Our analysis further reveals that VL support in LVLMs declines disproportionately relative to textual support for lower-resource languages, as evidenced by comparison of cross-modal and text-only topical matching performance. We further observe that open-weight LVLMs do not benefit from representing a topic with more than one image, suggesting that these models are not yet fully effective at handling multi-image tasks. By correlating performance on MVL-SIB with other multilingual VL benchmarks, we highlight that MVL-SIB serves as a comprehensive probe of multilingual VL understanding in LVLMs.

Summary

AI-Generated Summary

PDF32February 20, 2025