IndicMMLU-Pro:在多任务语言理解上对印度语系大型语言模型进行基准测试
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding
January 27, 2025
作者: Sankalp KJ, Ashutosh Kumar, Laxmaan Balaji, Nikunj Kotecha, Vinija Jain, Aman Chadha, Sreyoshi Bhaduri
cs.AI
摘要
印度次大陆有超过15亿人口使用的印度语言,由于其丰富的文化遗产、语言多样性和复杂结构,为自然语言处理(NLP)研究提供了独特的挑战和机遇。IndicMMLU-Pro是一个全面的基准,旨在评估大型语言模型(LLMs)在印度语言中的表现,构建在MMLU Pro(大规模多任务语言理解)框架之上。涵盖印地语、孟加拉语、古吉拉特语、马拉地语、卡纳达语、旁遮普语、泰米尔语、泰卢固语和乌尔都语等主要语言,我们的基准考虑了印度次大陆语言多样性带来的独特挑战和机遇。该基准涵盖了语言理解、推理和生成等广泛任务,精心设计以捕捉印度语言的复杂性。IndicMMLU-Pro提供了一个标准化评估框架,推动印度语言人工智能研究的边界,促进更准确、高效和具有文化敏感性的模型的发展。本文概述了基准设计原则、任务分类法和数据收集方法,并展示了来自最先进多语言模型的基准结果。
English
Known by more than 1.5 billion people in the Indian subcontinent, Indic
languages present unique challenges and opportunities for natural language
processing (NLP) research due to their rich cultural heritage, linguistic
diversity, and complex structures. IndicMMLU-Pro is a comprehensive benchmark
designed to evaluate Large Language Models (LLMs) across Indic languages,
building upon the MMLU Pro (Massive Multitask Language Understanding)
framework. Covering major languages such as Hindi, Bengali, Gujarati, Marathi,
Kannada, Punjabi, Tamil, Telugu, and Urdu, our benchmark addresses the unique
challenges and opportunities presented by the linguistic diversity of the
Indian subcontinent. This benchmark encompasses a wide range of tasks in
language comprehension, reasoning, and generation, meticulously crafted to
capture the intricacies of Indian languages. IndicMMLU-Pro provides a
standardized evaluation framework to push the research boundaries in Indic
language AI, facilitating the development of more accurate, efficient, and
culturally sensitive models. This paper outlines the benchmarks' design
principles, task taxonomy, and data collection methodology, and presents
baseline results from state-of-the-art multilingual models.Summary
AI-Generated Summary