BitNet v2：原生4位元激活搭配哈達瑪轉換的1位元大型語言模型

摘要

1位元大型語言模型（LLMs）的高效部署受到激活異常值的阻礙，這使得量化至低位元寬度變得複雜。我們引入了BitNet v2，這是一個新穎的框架，能夠實現1位元LLMs的原生4位元激活量化。為了解決注意力機制和前饋網路激活中的異常值問題，我們提出了H-BitLinear模組，該模組在激活量化之前應用線上哈達瑪變換。此變換將尖銳的激活分佈平滑為更接近高斯分佈的形式，適合低位元表示。實驗顯示，使用8位元激活從頭訓練的BitNet v2與BitNet b1.58的性能相當。關鍵在於，BitNet v2在使用原生4位元激活訓練時，性能下降極小，顯著降低了批量推理的記憶體佔用和計算成本。

English

Efficient deployment of 1-bit Large Language Models (LLMs) is hindered by activation outliers, which complicate quantization to low bit-widths. We introduce BitNet v2, a novel framework enabling native 4-bit activation quantization for 1-bit LLMs. To tackle outliers in attention and feed-forward network activations, we propose H-BitLinear, a module applying an online Hadamard transformation prior to activation quantization. This transformation smooths sharp activation distributions into more Gaussian-like forms, suitable for low-bit representation. Experiments show BitNet v2 trained from scratch with 8-bit activations matches BitNet b1.58 performance. Crucially, BitNet v2 achieves minimal performance degradation when trained with native 4-bit activations, significantly reducing memory footprint and computational cost for batched inference.

BitNet v2：原生4位元激活搭配哈達瑪轉換的1位元大型語言模型

BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

摘要

Summary

Support

Support