所有语言都重要:在具有文化多样性的100种语言上评估LMMs
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
November 25, 2024
作者: Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana, Noor Ahsan, Nevasini Sasikumar, Omkar Thawakar, Henok Biadglign Ademtew, Yahya Hmaiti, Amandeep Kumar, Kartik Kuckreja, Mykola Maslych, Wafa Al Ghallabi, Mihail Mihaylov, Chao Qin, Abdelrahman M Shaker, Mike Zhang, Mahardika Krisna Ihsani, Amiel Esplana, Monil Gokani, Shachar Mirkin, Harsh Singh, Ashay Srivastava, Endre Hamerlik, Fathinah Asma Izzati, Fadillah Adamsyah Maani, Sebastian Cavada, Jenny Chim, Rohit Gupta, Sanjay Manjunath, Kamila Zhumakhanova, Feno Heriniaina Rabevohitra, Azril Amirudin, Muhammad Ridzuan, Daniya Kareem, Ketan More, Kunyang Li, Pramesh Shakya, Muhammad Saad, Amirpouya Ghasemaghaei, Amirbek Djanibekov, Dilshod Azizov, Branislava Jankovic, Naman Bhatia, Alvaro Cabrera, Johan Obando-Ceron, Olympiah Otieno, Fabian Farestam, Muztoba Rabbani, Sanoojan Baliah, Santosh Sanjeev, Abduragim Shtanchaev, Maheen Fatima, Thao Nguyen, Amrin Kareem, Toluwani Aremu, Nathan Xavier, Amit Bhatkal, Hawau Toyin, Aman Chadha, Hisham Cholakkal, Rao Muhammad Anwer, Michael Felsberg, Jorma Laaksonen, Thamar Solorio, Monojit Choudhury, Ivan Laptev, Mubarak Shah, Salman Khan, Fahad Khan
cs.AI
摘要
现有的大型多模态模型(LMMs)通常只关注少数地区和语言。随着LMMs的不断改进,确保它们理解文化背景、尊重当地敏感性,并支持资源稀缺语言变得日益重要,同时有效地整合相应的视觉线索也同样重要。为了追求文化多样性的全球多模态模型,我们提出的全语言重要基准(ALM-bench)代表迄今为止评估100种语言中的LMMs的最大和最全面的努力。ALM-bench通过测试现有模型的能力来理解和推理与各种语言中的文化多样性图像配对的内容,挑战现有模型,包括许多传统上在LMM研究中代表性不足的资源稀缺语言。该基准提供了一个强大而细致的评估框架,包括真/假、多项选择和开放式问题等各种问题格式,并进一步分为短答案和长答案类别。ALM-bench的设计确保了对模型在视觉和语言推理的各种难度水平的处理能力进行全面评估。为了捕捉全球文化的丰富多样性,ALM-bench从13个不同的文化方面精心策划内容,涵盖传统、仪式、名人和庆祝活动等。通过这一点,ALM-bench不仅为最先进的开源和闭源LMMs提供了严格的测试平台,还突显了文化和语言包容性的重要性,鼓励开发能够有效服务各种全球人口的模型。我们的基准测试是公开可用的。
English
Existing Large Multimodal Models (LMMs) generally focus on only a few regions
and languages. As LMMs continue to improve, it is increasingly important to
ensure they understand cultural contexts, respect local sensitivities, and
support low-resource languages, all while effectively integrating corresponding
visual cues. In pursuit of culturally diverse global multimodal models, our
proposed All Languages Matter Benchmark (ALM-bench) represents the largest and
most comprehensive effort to date for evaluating LMMs across 100 languages.
ALM-bench challenges existing models by testing their ability to understand and
reason about culturally diverse images paired with text in various languages,
including many low-resource languages traditionally underrepresented in LMM
research. The benchmark offers a robust and nuanced evaluation framework
featuring various question formats, including true/false, multiple choice, and
open-ended questions, which are further divided into short and long-answer
categories. ALM-bench design ensures a comprehensive assessment of a model's
ability to handle varied levels of difficulty in visual and linguistic
reasoning. To capture the rich tapestry of global cultures, ALM-bench carefully
curates content from 13 distinct cultural aspects, ranging from traditions and
rituals to famous personalities and celebrations. Through this, ALM-bench not
only provides a rigorous testing ground for state-of-the-art open and
closed-source LMMs but also highlights the importance of cultural and
linguistic inclusivity, encouraging the development of models that can serve
diverse global populations effectively. Our benchmark is publicly available.Summary
AI-Generated Summary