所有語言都重要:在具有文化多樣性的100種語言上評估LMMs

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

November 25, 2024
作者: Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana, Noor Ahsan, Nevasini Sasikumar, Omkar Thawakar, Henok Biadglign Ademtew, Yahya Hmaiti, Amandeep Kumar, Kartik Kuckreja, Mykola Maslych, Wafa Al Ghallabi, Mihail Mihaylov, Chao Qin, Abdelrahman M Shaker, Mike Zhang, Mahardika Krisna Ihsani, Amiel Esplana, Monil Gokani, Shachar Mirkin, Harsh Singh, Ashay Srivastava, Endre Hamerlik, Fathinah Asma Izzati, Fadillah Adamsyah Maani, Sebastian Cavada, Jenny Chim, Rohit Gupta, Sanjay Manjunath, Kamila Zhumakhanova, Feno Heriniaina Rabevohitra, Azril Amirudin, Muhammad Ridzuan, Daniya Kareem, Ketan More, Kunyang Li, Pramesh Shakya, Muhammad Saad, Amirpouya Ghasemaghaei, Amirbek Djanibekov, Dilshod Azizov, Branislava Jankovic, Naman Bhatia, Alvaro Cabrera, Johan Obando-Ceron, Olympiah Otieno, Fabian Farestam, Muztoba Rabbani, Sanoojan Baliah, Santosh Sanjeev, Abduragim Shtanchaev, Maheen Fatima, Thao Nguyen, Amrin Kareem, Toluwani Aremu, Nathan Xavier, Amit Bhatkal, Hawau Toyin, Aman Chadha, Hisham Cholakkal, Rao Muhammad Anwer, Michael Felsberg, Jorma Laaksonen, Thamar Solorio, Monojit Choudhury, Ivan Laptev, Mubarak Shah, Salman Khan, Fahad Khan
cs.AI

摘要

現有的大型多模型模型(LMMs)通常僅專注於少數地區和語言。隨著LMMs的不斷改進,確保它們理解文化背景、尊重當地敏感性,並支持資源稀缺語言的重要性日益增加,同時有效地整合相應的視覺線索。為了追求文化多元的全球多模型模型,我們提出的所有語言都重要基準(ALM-bench)代表迄今為止對於評估100種語言的LMMs的最大和最全面的努力。ALM-bench通過測試現有模型的能力來理解和推理有關不同語言的文化多樣圖像配對的挑戰,包括許多在LMM研究中傳統上未受重視的資源稀缺語言。該基準提供了一個強大而細緻的評估框架,其中包括真假、多選和開放式問題等各種問題格式,進一步細分為短答案和長答案類別。ALM-bench的設計確保了對模型處理視覺和語言推理中不同難度水平的能力進行全面評估。為了捕捉全球文化豐富多彩的畫卷,ALM-bench從13個不同的文化方面精心策劃內容,範圍從傳統和儀式到著名人物和慶祝活動。通過這一點,ALM-bench不僅為最先進的開源和封閉源LMMs提供了一個嚴格的測試基礎,還突顯了文化和語言包容性的重要性,鼓勵開發可以有效服務多樣化全球人口的模型。我們的基準是公開可用的。
English
Existing Large Multimodal Models (LMMs) generally focus on only a few regions and languages. As LMMs continue to improve, it is increasingly important to ensure they understand cultural contexts, respect local sensitivities, and support low-resource languages, all while effectively integrating corresponding visual cues. In pursuit of culturally diverse global multimodal models, our proposed All Languages Matter Benchmark (ALM-bench) represents the largest and most comprehensive effort to date for evaluating LMMs across 100 languages. ALM-bench challenges existing models by testing their ability to understand and reason about culturally diverse images paired with text in various languages, including many low-resource languages traditionally underrepresented in LMM research. The benchmark offers a robust and nuanced evaluation framework featuring various question formats, including true/false, multiple choice, and open-ended questions, which are further divided into short and long-answer categories. ALM-bench design ensures a comprehensive assessment of a model's ability to handle varied levels of difficulty in visual and linguistic reasoning. To capture the rich tapestry of global cultures, ALM-bench carefully curates content from 13 distinct cultural aspects, ranging from traditions and rituals to famous personalities and celebrations. Through this, ALM-bench not only provides a rigorous testing ground for state-of-the-art open and closed-source LMMs but also highlights the importance of cultural and linguistic inclusivity, encouraging the development of models that can serve diverse global populations effectively. Our benchmark is publicly available.

Summary

AI-Generated Summary

PDF82November 27, 2024