Kuwain 1.5B：通过语言注入实现的阿拉伯语小型语言模型

摘要

在现有模型中融入新知识是人工智能发展的关键环节。本文提出了一种将新语言整合进大规模语言模型（LLM）的创新方法。我们的方法成功地将一种先前未见的目标语言融入现有LLM，且不损害其已有知识。我们通过向一个主要基于英语训练的小型开源模型注入阿拉伯语，训练了一个名为Kuwain、拥有15亿参数的微型模型。该方法在阿拉伯语性能上展现出显著提升，在多项基准测试中平均提高了8%，同时仅需少量原始模型数据即可保留其既有知识。这为同时训练英语和阿拉伯语的全方位模型提供了一种经济高效的替代方案。研究结果凸显了无需大规模重新训练或资源密集型过程，即可实现高效、定向语言模型扩展的潜力。

English

Enhancing existing models with new knowledge is a crucial aspect of AI development. This paper introduces a novel method for integrating a new language into a large language model (LLM). Our approach successfully incorporates a previously unseen target language into an existing LLM without compromising its prior knowledge. We trained a tiny model with 1.5 billion parameters named Kuwain by injecting the Arabic language into a small open-source model mainly trained in English. Our method demonstrates significant improvements in Arabic language performance, with an average 8% improvement across various benchmarks, while retaining the model's existing knowledge with a minimum amount of the original model's data. This offers a cost-effective alternative to training a comprehensive model in both English and Arabic. The results highlight the potential for efficient, targeted language model expansion without extensive retraining or resource-intensive processes.

Kuwain 1.5B：通过语言注入实现的阿拉伯语小型语言模型

Kuwain 1.5B: An Arabic SLM via Language Injection

摘要

Summary

Support

Support