LLM의 Densing 법칙

초록

대규모 언어 모델(LLMs)은 인공지능 분야에서의 이정표로 등장하며, 모델 크기가 증가함에 따라 성능도 향상될 수 있습니다. 그러나 이런 확장은 훈련 및 추론 효율성에 큰 도전을 가져오며, 특히 자원 제약 환경에서 LLM을 배포하는 데 어려움을 초래하며, 확장 추세는 점점 지속하기 어려워지고 있습니다. 본 논문에서는 '용량 밀도'라는 개념을 소개하여 다양한 규모의 LLM의 품질을 평가하는 새로운 지표로 제시하고, 효과성과 효율성 측면에서 LLM의 추세를 설명합니다. 주어진 대상 LLM의 용량 밀도를 계산하기 위해, 우리는 먼저 일련의 기준 모델을 소개하고 이러한 기준 모델들의 매개변수 크기를 기반으로 하여 이러한 기준 모델들의 하류 성능을 예측하기 위한 확장 법칙을 개발합니다. 그런 다음, 대상 LLM의 효과적인 매개변수 크기를 정의하고, 용량 밀도를 대상 LLM의 실제 매개변수 크기에 대한 효과적인 매개변수 크기의 비율로 형식화합니다. 용량 밀도는 모델의 효과성과 효율성을 평가하기 위한 통합된 프레임워크를 제공합니다. 최근 공개된 기본 LLM들에 대한 추가 분석은 LLM의 용량 밀도가 지수적으로 증가하는 경향을 보여주는 경험적 법칙(덴싱 법칙)을 밝혀냅니다. 구체적으로, 일반적으로 사용되는 몇 가지 평가 기준을 사용하여, LLM의 용량 밀도는 대략 3개월마다 두 배로 증가합니다. 이 법칙은 미래 LLM 개발을 지도하기 위한 새로운 시각을 제공하며, 최적의 결과를 달성하기 위해 용량 밀도를 향상시키는 중요성을 강조합니다.

English

Large Language Models (LLMs) have emerged as a milestone in artificial intelligence, and their performance can improve as the model size increases. However, this scaling brings great challenges to training and inference efficiency, particularly for deploying LLMs in resource-constrained environments, and the scaling trend is becoming increasingly unsustainable. This paper introduces the concept of ``capacity density'' as a new metric to evaluate the quality of the LLMs across different scales and describes the trend of LLMs in terms of both effectiveness and efficiency. To calculate the capacity density of a given target LLM, we first introduce a set of reference models and develop a scaling law to predict the downstream performance of these reference models based on their parameter sizes. We then define the effective parameter size of the target LLM as the parameter size required by a reference model to achieve equivalent performance, and formalize the capacity density as the ratio of the effective parameter size to the actual parameter size of the target LLM. Capacity density provides a unified framework for assessing both model effectiveness and efficiency. Our further analysis of recent open-source base LLMs reveals an empirical law (the densing law)that the capacity density of LLMs grows exponentially over time. More specifically, using some widely used benchmarks for evaluation, the capacity density of LLMs doubles approximately every three months. The law provides new perspectives to guide future LLM development, emphasizing the importance of improving capacity density to achieve optimal results with minimal computational overhead.

LLM의 Densing 법칙

Densing Law of LLMs

초록

Summary

Support