AIN:阿拉伯语INclusive大型多模型
AIN: The Arabic INclusive Large Multimodal Model
January 31, 2025
作者: Ahmed Heakl, Sara Ghaboura, Omkar Thawkar, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan
cs.AI
摘要
在大型语言模型(LLMs)迅速发展并演变为大型多模态模型(LMMs)的过程中,英语和中文等高资源语言取得了显著进展。尽管阿拉伯语LLMs取得了显著进展,但阿拉伯语LMMs仍然很少被探索,通常只关注语言和视觉理解的少数特定方面。为了弥补这一差距,我们引入了AIN-阿拉伯语包容性多模态模型-旨在在各种领域表现出色。AIN是一种英阿双语LMM,旨在在英语和阿拉伯语中表现出色,利用精心构建的360万高质量阿拉伯语-英语多模态数据样本。AIN展示了最先进的阿拉伯语性能,同时具有强大的英语语言视觉能力。在最近的CAMEL-Bench基准测试中,涵盖了38个子领域,包括多图像理解、复杂视觉感知、手写文档理解、视频理解、医学成像、植物疾病和基于遥感的土地利用理解,我们的AIN表现出色,7B模型在八个领域和38个子领域上的绝对增益超过了GPT-4o,达到了3.4%。AIN卓越的能力使其成为向阿拉伯语使用者提供先进多模态生成人工智能工具的重要一步,适用于各种应用。
English
Amid the swift progress of large language models (LLMs) and their evolution
into large multimodal models (LMMs), significant strides have been made in
high-resource languages such as English and Chinese. While Arabic LLMs have
seen notable progress, Arabic LMMs remain largely unexplored, often narrowly
focusing on a few specific aspects of the language and visual understanding. To
bridge this gap, we introduce AIN-the Arabic Inclusive Multimodal
Model-designed to excel across diverse domains. AIN is an English-Arabic
bilingual LMM designed to excel in English and Arabic, leveraging carefully
constructed 3.6 million high-quality Arabic-English multimodal data samples.
AIN demonstrates state-of-the-art Arabic performance, while also possessing
strong English-language visual capabilities. On the recent CAMEL-Bench
benchmark comprising 38 sub-domains including, multi-image understanding,
complex visual perception, handwritten document understanding, video
understanding, medical imaging, plant diseases, and remote sensing-based land
use understanding, our AIN demonstrates strong performance with the 7B model
outperforming GPT-4o by an absolute gain of 3.4% averaged over eight domains
and 38 sub-domains. AIN's superior capabilities position it as a significant
step toward empowering Arabic speakers with advanced multimodal generative AI
tools across diverse applications.Summary
AI-Generated Summary