ChatPaper.aiChatPaper

EoRA:基于特征空间的压缩LLM的无需训练补偿低秩逼近

EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation

October 28, 2024
作者: Shih-Yang Liu, Huck Yang, Chein-Yi Wang, Nai Chit Fung, Hongxu Yin, Charbel Sakr, Saurav Muralidharan, Kwang-Ting Cheng, Jan Kautz, Yu-Chiang Frank Wang, Pavlo Molchanov, Min-Hung Chen
cs.AI

摘要

在这项工作中,我们将模型压缩问题重新构建为定制补偿问题:给定一个压缩模型,我们旨在引入残差低秩路径来补偿压缩错误,根据用户的定制需求(例如任务、压缩比),从而在调整整体容量时具有更大的灵活性,而不受特定压缩格式的限制。然而,简单地应用奇异值分解(SVD)来推导残差路径会导致低秩表示容量的次优利用。相反,我们提出了一种名为无训练特征空间低秩逼近(EoRA)的方法,该方法直接最小化压缩引起的错误,无需基于梯度的训练,在几分钟内使用少量校准数据实现快速优化。EoRA将压缩错误投影到输入激活的特征空间中,利用特征值有效地优先重建高重要性的错误组件。此外,EoRA可以与微调和量化轻松集成,以进一步提高效果和效率。在各种任务(如语言生成、常识推理和数学推理任务)中,EoRA在补偿压缩的LLaMA2/3模型错误方面始终优于先前的方法(例如,在将量化为4位且稀疏度为2:4的LLaMA3-8B进行补偿时,在ARC-Easy/ARC-Challenge和MathQA上分别提高了31.31%/12.88%和9.69%)。EoRA提供了一种可扩展的、无训练的解决方案,用于补偿压缩错误,使其成为在各种容量和效率需求下部署LLMs的强大工具。
English
In this work, we re-formulate the model compression problem into the customized compensation problem: Given a compressed model, we aim to introduce residual low-rank paths to compensate for compression errors under customized requirements from users (e.g., tasks, compression ratios), resulting in greater flexibility in adjusting overall capacity without being constrained by specific compression formats. However, naively applying SVD to derive residual paths causes suboptimal utilization of the low-rank representation capacity. Instead, we propose Training-free Eigenspace Low-Rank Approximation (EoRA), a method that directly minimizes compression-induced errors without requiring gradient-based training, achieving fast optimization in minutes using a small amount of calibration data. EoRA projects compression errors into the eigenspace of input activations, leveraging eigenvalues to effectively prioritize the reconstruction of high-importance error components. Moreover, EoRA can be seamlessly integrated with fine-tuning and quantization to further improve effectiveness and efficiency. EoRA consistently outperforms previous methods in compensating errors for compressed LLaMA2/3 models on various tasks, such as language generation, commonsense reasoning, and math reasoning tasks (e.g., 31.31%/12.88% and 9.69% improvements on ARC-Easy/ARC-Challenge and MathQA when compensating LLaMA3-8B that is quantized to 4-bit and pruned to 2:4 sparsity). EoRA offers a scalable, training-free solution to compensate for compression errors, making it a powerful tool to deploy LLMs in various capacity and efficiency requirements.

Summary

AI-Generated Summary

PDF72November 16, 2024