EoRA:具有特徵空間低秩近似的壓縮LLM的無需訓練補償
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
October 28, 2024
作者: Shih-Yang Liu, Huck Yang, Chein-Yi Wang, Nai Chit Fung, Hongxu Yin, Charbel Sakr, Saurav Muralidharan, Kwang-Ting Cheng, Jan Kautz, Yu-Chiang Frank Wang, Pavlo Molchanov, Min-Hung Chen
cs.AI
摘要
在這項工作中,我們將模型壓縮問題重新定義為定制補償問題:給定壓縮模型,我們的目標是引入殘差低秩路徑,以滿足用戶的定制需求(例如任務、壓縮比),從而在不受特定壓縮格式限制的情況下,更靈活地調整整體容量。然而,單純應用奇異值分解(SVD)來推導殘差路徑會導致低秩表示容量的次優利用。相反,我們提出了一種名為Training-free Eigenspace Low-Rank Approximation(EoRA)的方法,該方法直接最小化壓縮引起的錯誤,無需基於梯度的訓練,在使用少量校準數據的情況下,可在幾分鐘內實現快速優化。EoRA將壓縮錯誤投影到輸入激活的特徵空間中,利用特徵值有效地優先考慮重建高重要性的錯誤組件。此外,EoRA可以與微調和量化無縫集成,以進一步提高效果和效率。在各種任務(例如語言生成、常識推理和數學推理任務)中,EoRA在補償壓縮的LLaMA2/3模型的錯誤方面始終優於先前的方法(例如在將量化為4位並剪枝為2:4稀疏度的LLaMA3-8B進行補償時,ARC-Easy/ARC-Challenge和MathQA的改進分別為31.31%/12.88%和9.69%)。EoRA提供了一種可擴展的、無需訓練的解決方案,用於補償壓縮錯誤,使其成為在各種容量和效率需求下部署LLM的強大工具。
English
In this work, we re-formulate the model compression problem into the
customized compensation problem: Given a compressed model, we aim to introduce
residual low-rank paths to compensate for compression errors under customized
requirements from users (e.g., tasks, compression ratios), resulting in greater
flexibility in adjusting overall capacity without being constrained by specific
compression formats. However, naively applying SVD to derive residual paths
causes suboptimal utilization of the low-rank representation capacity. Instead,
we propose Training-free Eigenspace Low-Rank Approximation (EoRA), a method
that directly minimizes compression-induced errors without requiring
gradient-based training, achieving fast optimization in minutes using a small
amount of calibration data. EoRA projects compression errors into the
eigenspace of input activations, leveraging eigenvalues to effectively
prioritize the reconstruction of high-importance error components. Moreover,
EoRA can be seamlessly integrated with fine-tuning and quantization to further
improve effectiveness and efficiency. EoRA consistently outperforms previous
methods in compensating errors for compressed LLaMA2/3 models on various tasks,
such as language generation, commonsense reasoning, and math reasoning tasks
(e.g., 31.31%/12.88% and 9.69% improvements on ARC-Easy/ARC-Challenge and
MathQA when compensating LLaMA3-8B that is quantized to 4-bit and pruned to 2:4
sparsity). EoRA offers a scalable, training-free solution to compensate for
compression errors, making it a powerful tool to deploy LLMs in various
capacity and efficiency requirements.Summary
AI-Generated Summary