Kuwain 1.5B:通过语言注入实现的阿拉伯语小型语言模型
Kuwain 1.5B: An Arabic SLM via Language Injection
April 21, 2025
作者: Khalil Hennara, Sara Chrouf, Mohamed Motaism Hamed, Zeina Aldallal, Omar Hadid, Safwan AlModhayan
cs.AI
摘要
提升現有模型以融入新知識是人工智慧發展的關鍵面向。本文提出了一種新穎方法,用於將新語言整合至大型語言模型(LLM)中。我們的方法成功將先前未見的目標語言融入現有LLM,且不損及其既有知識。我們通過將阿拉伯語注入一個主要基於英語訓練的小型開源模型,訓練了一個名為Kuwain、擁有15億參數的微型模型。我們的方法在阿拉伯語性能上展現了顯著提升,各項基準測試平均提高了8%,同時僅需最少量的原始模型數據即可保留模型的既有知識。這為同時訓練涵蓋英語和阿拉伯語的全面模型提供了一種成本效益高的替代方案。研究結果凸顯了無需大規模重新訓練或耗費大量資源,即可實現高效、針對性的語言模型擴展的潛力。
English
Enhancing existing models with new knowledge is a crucial aspect of AI
development. This paper introduces a novel method for integrating a new
language into a large language model (LLM). Our approach successfully
incorporates a previously unseen target language into an existing LLM without
compromising its prior knowledge. We trained a tiny model with 1.5 billion
parameters named Kuwain by injecting the Arabic language into a small
open-source model mainly trained in English. Our method demonstrates
significant improvements in Arabic language performance, with an average 8%
improvement across various benchmarks, while retaining the model's existing
knowledge with a minimum amount of the original model's data. This offers a
cost-effective alternative to training a comprehensive model in both English
and Arabic. The results highlight the potential for efficient, targeted
language model expansion without extensive retraining or resource-intensive
processes.Summary
AI-Generated Summary