EoRA: 고유 공간을 활용한 압축된 LLM에 대한 훈련 없는 보상低秩 근사화

초록

본 연구에서는 모델 압축 문제를 사용자의 맞춤 요구 사항(예: 작업, 압축 비율)에 따라 압축된 모델에 잔여 저랭크 경로를 도입하여 압축 오류를 보상하는 맞춤 보상 문제로 재정의합니다. 이로써 특정 압축 형식에 제약받지 않고 전체 용량을 조절하는 유연성이 증가합니다. 그러나 잔여 경로를 유도하기 위해 단순히 SVD를 적용하는 것은 저랭크 표현 용량의 최적 활용을 방해합니다. 대신, 우리는 Training-free Eigenspace Low-Rank Approximation (EoRA)이라는 방법을 제안합니다. 이 방법은 경사 기반 훈련을 필요로 하지 않고 압축으로 인한 오류를 직접 최소화하여 소량의 보정 데이터를 사용하여 몇 분 안에 빠른 최적화를 달성합니다. EoRA는 압축 오류를 입력 활성화의 고유 공간으로 투영하여 고유값을 활용하여 고중요도 오류 구성 요소의 재구성을 효과적으로 우선시합니다. 게다가, EoRA는 세밀 조정 및 양자화와 원활하게 통합되어 효과와 효율성을 더욱 향상시킬 수 있습니다. EoRA는 다양한 작업(예: 언어 생성, 상식적 추론, 수학 추론 작업)에서 압축된 LLaMA2/3 모델의 오류를 보상하는 데 이전 방법들보다 우수한 성능을 지속적으로 보여주며, 예를 들어 4비트 양자화 및 2:4 희소성으로 양자화된 LLaMA3-8B를 보상할 때 ARC-Easy/ARC-Challenge 및 MathQA에서 31.31%/12.88% 및 9.69%의 개선을 달성합니다. EoRA는 압축 오류를 보상하기 위한 확장 가능하고 훈련 불필요한 솔루션을 제공하여 다양한 용량 및 효율성 요구 사항에 따라 LLMs를 배포하는 강력한 도구로 작용합니다.

English

In this work, we re-formulate the model compression problem into the customized compensation problem: Given a compressed model, we aim to introduce residual low-rank paths to compensate for compression errors under customized requirements from users (e.g., tasks, compression ratios), resulting in greater flexibility in adjusting overall capacity without being constrained by specific compression formats. However, naively applying SVD to derive residual paths causes suboptimal utilization of the low-rank representation capacity. Instead, we propose Training-free Eigenspace Low-Rank Approximation (EoRA), a method that directly minimizes compression-induced errors without requiring gradient-based training, achieving fast optimization in minutes using a small amount of calibration data. EoRA projects compression errors into the eigenspace of input activations, leveraging eigenvalues to effectively prioritize the reconstruction of high-importance error components. Moreover, EoRA can be seamlessly integrated with fine-tuning and quantization to further improve effectiveness and efficiency. EoRA consistently outperforms previous methods in compensating errors for compressed LLaMA2/3 models on various tasks, such as language generation, commonsense reasoning, and math reasoning tasks (e.g., 31.31%/12.88% and 9.69% improvements on ARC-Easy/ARC-Challenge and MathQA when compensating LLaMA3-8B that is quantized to 4-bit and pruned to 2:4 sparsity). EoRA offers a scalable, training-free solution to compensate for compression errors, making it a powerful tool to deploy LLMs in various capacity and efficiency requirements.

EoRA: 고유 공간을 활용한 압축된 LLM에 대한 훈련 없는 보상低秩 근사화

EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation

초록

Support