LLäMmlein: 초소형 및 경쟁력 있는 독일어 전용 언어 모델의 Scratch로부터

초록

우리는 LL\"aMmlein 120M 및 1B 두 개의 독일어 전용 디코더 모델을 처음부터 투명하게 만들고 독일어 NLP 연구 커뮤니티가 사용할 수 있도록 훈련 데이터와 함께 발표했습니다. 모델 훈련에는 광범위한 데이터 전처리, 사용자 정의 독일어 토크나이저의 생성, 훈련 자체, 그리고 최종 모델을 다양한 벤치마크에서 평가하는 등 여러 중요 단계가 포함되었습니다. 훈련 과정에서 여러 체크포인트가 저장되고 모델의 학습 동태를 모니터링하기 위해 SuperGLEBer 벤치마크를 사용하여 분석되었습니다. SuperGLEBer 벤치마크에서 최첨단 모델과 비교했을 때, 두 LL\"aMmlein 모델은 경쟁력 있게 성과를 내며, 유사한 매개변수 크기를 갖는 모델들을 일관되게 능가하거나 맞먹었습니다. 결과는 모델의 품질이 예상대로 크기와 함께 증가함을 보여주지만, 일부 작업에서의 성능 향상은 일찍 수렴하여, 미래 모델 개발을 위한 자원 할당에 대한 소중한 통찰을 제공했습니다.

English

We create two German-only decoder models, LL\"aMmlein 120M and 1B, transparently from scratch and publish them, along with the training data, for the German NLP research community to use. The model training involved several key steps, including extensive data preprocessing, the creation of a custom German tokenizer, the training itself, as well as the evaluation of the final models on various benchmarks. Throughout the training process, multiple checkpoints were saved and analyzed using the SuperGLEBer benchmark to monitor the models' learning dynamics. Compared to state-of-the-art models on the SuperGLEBer benchmark, both LL\"aMmlein models performed competitively, consistently matching or surpassing models with similar parameter sizes. The results show that the models' quality scales with size as expected, but performance improvements on some tasks plateaued early, offering valuable insights into resource allocation for future model development.

LLäMmlein: 초소형 및 경쟁력 있는 독일어 전용 언어 모델의 Scratch로부터

LLäMmlein: Compact and Competitive German-Only Language Models from Scratch

초록

Summary

Support