浮点量化训练的缩放定律

摘要

低精度训练被认为是降低训练和下游推理成本的有效策略。先前关于精度的缩放定律主要集中在整数量化上，较少关注浮点量化中的组成部分，因此不能很好地适应在这种情况下的LLM损失。相比之下，虽然浮点量化训练在生产中更常见，但对其研究相对表面。本文全面探讨了浮点量化目标、指数位、尾数位以及浮点量化训练中缩放因子的计算粒度对LLM模型性能的影响。在提出准确的浮点量化统一缩放定律的同时，我们还为社区提供了宝贵建议：(1) 指数位对模型性能的贡献略高于尾数位。我们为不同位数提供了最佳指数-尾数位比，可供硬件制造商未来参考；(2) 我们发现在低精度LLM训练中形成了关键数据大小。过多的训练数据超过关键数据大小将逆向带来LLM性能的降级；(3) 最佳浮点量化精度与计算能力成正比，但在广泛的计算能力范围内，我们估计最佳成本性能精度位于4-8位之间。

English

Low-precision training is considered an effective strategy for reducing both training and downstream inference costs. Previous scaling laws for precision mainly focus on integer quantization, which pay less attention to the constituents in floating-point quantization and thus cannot well fit the LLM losses in this scenario. In contrast, while floating-point quantization training is more commonly implemented in production, the research on it has been relatively superficial. In this paper, we thoroughly explore the effects of floating-point quantization targets, exponent bits, mantissa bits, and the calculation granularity of the scaling factor in floating-point quantization training performance of LLM models. While presenting an accurate floating-point quantization unified scaling law, we also provide valuable suggestions for the community: (1) Exponent bits contribute slightly more to the model performance than mantissa bits. We provide the optimal exponent-mantissa bit ratio for different bit numbers, which is available for future reference by hardware manufacturers; (2) We discover the formation of the critical data size in low-precision LLM training. Too much training data exceeding the critical data size will inversely bring in degradation of LLM performance; (3) The optimal floating-point quantization precision is directly proportional to the computational power, but within a wide computational power range, we estimate that the best cost-performance precision lies between 4-8 bits.

浮点量化训练的缩放定律

Scaling Laws for Floating Point Quantization Training

摘要

Summary

Support