ChatPaper.aiChatPaper

让LoRA再创辉煌:通过自适应奇异值与专家混合优化对齐提升LoRA性能

Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

February 24, 2025
作者: Chenghao Fan, Zhenyi Lu, Sichen Liu, Xiaoye Qu, Wei Wei, Chengfeng Gu, Yu Cheng
cs.AI

摘要

尽管低秩适应(LoRA)为大型语言模型(LLMs)提供了参数高效的微调方法,但其性能往往不及全量微调(Full FT)。现有方法通过初始化静态奇异值分解(SVD)子集来优化LoRA,导致对预训练知识的利用不够充分。另一条提升LoRA的路径是引入混合专家(MoE)架构。然而,权重不对齐和复杂的梯度动态使得在LoRA MoE架构之前采用SVD颇具挑战。为解决这些问题,我们提出了Great LoRA混合专家(GOAT)框架,该框架(1)利用SVD结构的MoE自适应整合相关先验知识,(2)通过推导理论缩放因子,使优化与全量微调MoE对齐。我们证明,在不改变架构或训练算法的情况下,适当的缩放能显著提升LoRA MoE的效率和性能。在涵盖自然语言理解、常识推理、图像分类和自然语言生成的25个数据集上的实验表明,GOAT实现了最先进的性能,缩小了与Full FT的差距。
English
While Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning for Large Language Models (LLMs), its performance often falls short of Full Fine-Tuning (Full FT). Current methods optimize LoRA by initializing with static singular value decomposition (SVD) subsets, leading to suboptimal leveraging of pre-trained knowledge. Another path for improving LoRA is incorporating a Mixture-of-Experts (MoE) architecture. However, weight misalignment and complex gradient dynamics make it challenging to adopt SVD prior to the LoRA MoE architecture. To mitigate these issues, we propose Great LoRA Mixture-of-Expert (GOAT), a framework that (1) adaptively integrates relevant priors using an SVD-structured MoE, and (2) aligns optimization with full fine-tuned MoE by deriving a theoretical scaling factor. We demonstrate that proper scaling, without modifying the architecture or training algorithms, boosts LoRA MoE's efficiency and performance. Experiments across 25 datasets, including natural language understanding, commonsense reasoning, image classification, and natural language generation, demonstrate GOAT's state-of-the-art performance, closing the gap with Full FT.

Summary

AI-Generated Summary

PDF274February 25, 2025