LLäMmlein:从零开始的紧凑而竞争力强的仅德语语言模型

LLäMmlein: Compact and Competitive German-Only Language Models from Scratch

November 17, 2024
作者: Jan Pfister, Julia Wunderle, Andreas Hotho
cs.AI

摘要

我们从零开始透明地创建了两个仅限德语的解码器模型,LL\"aMmlein 120M和1B,并将它们连同训练数据一起发布,供德语自然语言处理研究社区使用。模型训练涉及多个关键步骤,包括广泛的数据预处理、创建自定义德语分词器、训练本身以及对最终模型在各种基准测试上的评估。在整个训练过程中,我们保存了多个检查点,并使用SuperGLEBer基准进行分析,以监控模型的学习动态。与SuperGLEBer基准上的最先进模型相比,两个LL\"aMmlein模型表现出竞争力,始终能够与具有相似参数规模的模型相匹敌或超越。结果显示,正如预期的那样,模型的质量随着规模的增大而提高,但在某些任务上的性能改进很快就达到了平稳期,为未来模型开发中资源分配提供了宝贵的见解。
English
We create two German-only decoder models, LL\"aMmlein 120M and 1B, transparently from scratch and publish them, along with the training data, for the German NLP research community to use. The model training involved several key steps, including extensive data preprocessing, the creation of a custom German tokenizer, the training itself, as well as the evaluation of the final models on various benchmarks. Throughout the training process, multiple checkpoints were saved and analyzed using the SuperGLEBer benchmark to monitor the models' learning dynamics. Compared to state-of-the-art models on the SuperGLEBer benchmark, both LL\"aMmlein models performed competitively, consistently matching or surpassing models with similar parameter sizes. The results show that the models' quality scales with size as expected, but performance improvements on some tasks plateaued early, offering valuable insights into resource allocation for future model development.

Summary

AI-Generated Summary

PDF83November 19, 2024