LLM的密度定律
Densing Law of LLMs
December 5, 2024
作者: Chaojun Xiao, Jie Cai, Weilin Zhao, Guoyang Zeng, Xu Han, Zhiyuan Liu, Maosong Sun
cs.AI
摘要
大型语言模型(LLMs)已成为人工智能领域的一个里程碑,随着模型规模的增加,它们的性能也会提升。然而,这种扩展给训练和推理效率带来了巨大挑战,尤其是在资源受限的环境中部署LLMs时,这种扩展趋势变得越来越不可持续。本文引入了“容量密度”概念作为评估不同规模LLMs质量的新度量标准,并描述了LLMs在有效性和效率方面的发展趋势。为了计算给定目标LLM的容量密度,我们首先引入一组参考模型,并制定一个缩放定律来预测这些参考模型的下游性能,基于它们的参数大小。然后,我们将目标LLM的有效参数大小定义为参考模型实现等效性能所需的参数大小,并将容量密度形式化为有效参数大小与目标LLM的实际参数大小之比。容量密度提供了评估模型有效性和效率的统一框架。我们对最近的开源基础LLMs进行进一步分析,揭示了一个经验定律(致密定律),即LLMs的容量密度随时间呈指数增长。具体而言,使用一些广泛使用的基准进行评估,LLMs的容量密度大约每三个月翻倍一次。这一定律为指导未来LLM的发展提供了新视角,强调提高容量密度以实现最佳结果并减少计算开销的重要性。
English
Large Language Models (LLMs) have emerged as a milestone in artificial
intelligence, and their performance can improve as the model size increases.
However, this scaling brings great challenges to training and inference
efficiency, particularly for deploying LLMs in resource-constrained
environments, and the scaling trend is becoming increasingly unsustainable.
This paper introduces the concept of ``capacity density'' as a new
metric to evaluate the quality of the LLMs across different scales and
describes the trend of LLMs in terms of both effectiveness and efficiency. To
calculate the capacity density of a given target LLM, we first introduce a set
of reference models and develop a scaling law to predict the downstream
performance of these reference models based on their parameter sizes. We then
define the effective parameter size of the target LLM as the parameter
size required by a reference model to achieve equivalent performance, and
formalize the capacity density as the ratio of the effective parameter size to
the actual parameter size of the target LLM. Capacity density provides a
unified framework for assessing both model effectiveness and efficiency. Our
further analysis of recent open-source base LLMs reveals an empirical law (the
densing law)that the capacity density of LLMs grows exponentially over time.
More specifically, using some widely used benchmarks for evaluation, the
capacity density of LLMs doubles approximately every three months. The law
provides new perspectives to guide future LLM development, emphasizing the
importance of improving capacity density to achieve optimal results with
minimal computational overhead.Summary
AI-Generated Summary