LLM的Densing定律

摘要

大型語言模型（LLMs）已成為人工智慧領域的一個里程碑，隨著模型大小的增加，其性能也會提升。然而，這種擴展給訓練和推理效率帶來了巨大挑戰，尤其是在資源受限的環境中部署LLMs時，擴展趨勢變得日益不可持續。本文引入“容量密度”概念作為評估不同規模LLMs質量的新指標，並描述LLMs在有效性和效率方面的趨勢。為了計算特定目標LLM的容量密度，我們首先引入一組參考模型，並制定一個縮放定律來預測這些參考模型基於其參數大小的下游性能。然後，我們將目標LLM的有效參數大小定義為實現相同性能所需的參考模型參數大小，並將容量密度形式化為有效參數大小與目標LLM實際參數大小的比率。容量密度提供了評估模型有效性和效率的統一框架。我們對最近開源的基礎LLMs進行進一步分析，揭示了一個實證定律（密集定律），即LLMs的容量密度隨時間呈指數增長。具體來說，使用一些廣泛使用的基準進行評估，LLMs的容量密度大約每三個月翻倍。這個定律為引導未來LLM發展提供了新的觀點，強調提高容量密度的重要性，以實現最佳結果並減少計算開銷。

English

Large Language Models (LLMs) have emerged as a milestone in artificial intelligence, and their performance can improve as the model size increases. However, this scaling brings great challenges to training and inference efficiency, particularly for deploying LLMs in resource-constrained environments, and the scaling trend is becoming increasingly unsustainable. This paper introduces the concept of ``capacity density'' as a new metric to evaluate the quality of the LLMs across different scales and describes the trend of LLMs in terms of both effectiveness and efficiency. To calculate the capacity density of a given target LLM, we first introduce a set of reference models and develop a scaling law to predict the downstream performance of these reference models based on their parameter sizes. We then define the effective parameter size of the target LLM as the parameter size required by a reference model to achieve equivalent performance, and formalize the capacity density as the ratio of the effective parameter size to the actual parameter size of the target LLM. Capacity density provides a unified framework for assessing both model effectiveness and efficiency. Our further analysis of recent open-source base LLMs reveals an empirical law (the densing law)that the capacity density of LLMs grows exponentially over time. More specifically, using some widely used benchmarks for evaluation, the capacity density of LLMs doubles approximately every three months. The law provides new perspectives to guide future LLM development, emphasizing the importance of improving capacity density to achieve optimal results with minimal computational overhead.

LLM的Densing定律

Densing Law of LLMs

摘要

Support