BitNet v2:原生4位元激活搭配哈達瑪轉換的1位元大型語言模型
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
April 25, 2025
作者: Hongyu Wang, Shuming Ma, Furu Wei
cs.AI
摘要
1位元大型語言模型(LLMs)的高效部署受到激活異常值的阻礙,這使得量化至低位元寬度變得複雜。我們引入了BitNet v2,這是一個新穎的框架,能夠實現1位元LLMs的原生4位元激活量化。為了解決注意力機制和前饋網路激活中的異常值問題,我們提出了H-BitLinear模組,該模組在激活量化之前應用線上哈達瑪變換。此變換將尖銳的激活分佈平滑為更接近高斯分佈的形式,適合低位元表示。實驗顯示,使用8位元激活從頭訓練的BitNet v2與BitNet b1.58的性能相當。關鍵在於,BitNet v2在使用原生4位元激活訓練時,性能下降極小,顯著降低了批量推理的記憶體佔用和計算成本。
English
Efficient deployment of 1-bit Large Language Models (LLMs) is hindered by
activation outliers, which complicate quantization to low bit-widths. We
introduce BitNet v2, a novel framework enabling native 4-bit activation
quantization for 1-bit LLMs. To tackle outliers in attention and feed-forward
network activations, we propose H-BitLinear, a module applying an online
Hadamard transformation prior to activation quantization. This transformation
smooths sharp activation distributions into more Gaussian-like forms, suitable
for low-bit representation. Experiments show BitNet v2 trained from scratch
with 8-bit activations matches BitNet b1.58 performance. Crucially, BitNet v2
achieves minimal performance degradation when trained with native 4-bit
activations, significantly reducing memory footprint and computational cost for
batched inference.Summary
AI-Generated Summary