利用FP4量化优化大型语言模型训练

摘要

训练大型语言模型（LLMs）所需的计算需求不断增长，需要更高效的方法。量化训练提供了一种有前途的解决方案，通过使用低位算术运算来降低成本。虽然FP8精度已经证明是可行的，但利用FP4仍然是一个挑战，因为存在显著的量化误差和有限的表示能力。本研究引入了第一个针对LLMs的FP4训练框架，通过两个关键创新来解决这些挑战：一个可微分的量化估计器用于精确的权重更新，以及一种异常值夹紧和补偿策略，以防止激活崩溃。为确保稳定性，该框架集成了混合精度训练方案和矢量化量化。实验结果表明，我们的FP4框架实现了与BF16和FP8相当的准确性，降级最小，有效扩展到使用多达100B标记训练的13B参数LLMs。随着支持FP4的下一代硬件的出现，我们的框架为高效的超低精度训练奠定了基础。

English

The growing computational demands of training large language models (LLMs) necessitate more efficient methods. Quantized training presents a promising solution by enabling low-bit arithmetic operations to reduce these costs. While FP8 precision has demonstrated feasibility, leveraging FP4 remains a challenge due to significant quantization errors and limited representational capacity. This work introduces the first FP4 training framework for LLMs, addressing these challenges with two key innovations: a differentiable quantization estimator for precise weight updates and an outlier clamping and compensation strategy to prevent activation collapse. To ensure stability, the framework integrates a mixed-precision training scheme and vector-wise quantization. Experimental results demonstrate that our FP4 framework achieves accuracy comparable to BF16 and FP8, with minimal degradation, scaling effectively to 13B-parameter LLMs trained on up to 100B tokens. With the emergence of next-generation hardware supporting FP4, our framework sets a foundation for efficient ultra-low precision training.

利用FP4量化优化大型语言模型训练

Optimizing Large Language Model Training Using FP4 Quantization

摘要

Summary

Support