低位元量化有利於未充分訓練的LLM:具有100T訓練標記的量化LLM的擴展定律
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
November 26, 2024
作者: Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, Dong Yu
cs.AI
摘要
我們揭示了低位元量化有利於訓練不足的大型語言模型(LLMs),觀察到較大尺寸或較少訓練標記的模型在應用低位元量化時,受到的量化誘發降級(QiD)較少,而具有廣泛訓練標記的較小模型則遭受顯著的QiD。為深入了解這一趨勢,我們在受控環境中研究了1500多個不同尺寸和不同訓練水平(訓練不足或完全訓練)的量化LLM檢查點,推導出用於理解QiD與訓練標記數量、模型尺寸和位元寬度等因素之間關係的標度律。
通過推導的標度律,我們提出了一個新的觀點,即我們可以使用QiD來衡量LLM的訓練水平,並確定各種尺寸的LLM完全訓練所需的訓練標記數量。此外,我們使用標度律來預測使用100萬億標記進行訓練的不同尺寸LLM的量化性能。我們的預測顯示,未來模型的低位元量化性能,預計將使用超過100萬億標記進行訓練,可能並不理想。這對未來的低位元量化提出了潛在挑戰,並強調了在評估低位元量化研究時需要意識到模型的訓練水平。為了促進這一問題的未來研究,我們在https://huggingface.co/Xu-Ouyang 上發布了本研究中使用的所有1500多個量化檢查點。
English
We reveal that low-bit quantization favors undertrained large language models
(LLMs) by observing that models with larger sizes or fewer training tokens
experience less quantization-induced degradation (QiD) when applying low-bit
quantization, whereas smaller models with extensive training tokens suffer
significant QiD. To gain deeper insights into this trend, we study over 1500
quantized LLM checkpoints of various sizes and at different training levels
(undertrained or fully trained) in a controlled setting, deriving scaling laws
for understanding the relationship between QiD and factors such as the number
of training tokens, model size and bit width.
With the derived scaling laws, we propose a novel perspective that we can use
QiD to measure an LLM's training levels and determine the number of training
tokens required for fully training LLMs of various sizes. Moreover, we use the
scaling laws to predict the quantization performance of different-sized LLMs
trained with 100 trillion tokens. Our projection shows that the low-bit
quantization performance of future models, which are expected to be trained
with over 100 trillion tokens, may NOT be desirable. This poses a potential
challenge for low-bit quantization in the future and highlights the need for
awareness of a model's training level when evaluating low-bit quantization
research. To facilitate future research on this problem, we release all the
1500+ quantized checkpoints used in this work at
https://huggingface.co/Xu-Ouyang.Summary
AI-Generated Summary