KaSA:知识感知的大型语言模型奇异值适应
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models
December 8, 2024
作者: Fan Wang, Juyong Jiang, Chansung Park, Sunghun Kim, Jing Tang
cs.AI
摘要
随着大型语言模型(LLMs)规模的增加,将这些模型调整到特定任务或领域时会导致显著的计算开销和内存使用量。为了缓解这些挑战,人们提出了各种参数高效微调(PEFT)方法,通过训练一小组参数来进行模型权重的任务特定更新。在PEFT方法中,LoRA以其简单性和高效性脱颖而出,激发了一系列变体的发展。然而,LoRA及其后继者忽视了与目标任务无关或无用的知识,对模型性能造成不利影响,导致次优性。为了解决这一局限性,我们引入了基于知识感知奇异值调整(KaSA)的PEFT方法,利用奇异值分解(SVD)和知识感知奇异值来动态激活与当前任务相关的知识。我们在涵盖自然语言理解(NLU)、生成(NLG)、遵循指令和常识推理等任务的一系列LLMs上进行了大量实验。实验结果表明,KaSA在16个基准测试和4个合成数据集上始终优于FFT和其他14种流行的PEFT基准方法,突显了我们方法的有效性和适应性。我们的方法源代码可在https://github.com/juyongjiang/KaSA 上获取。
English
The increasing sizes of large language models (LLMs) result in significant
computational overhead and memory usage when adapting these models to specific
tasks or domains. Various parameter-efficient fine-tuning (PEFT) methods have
been devised to mitigate these challenges by training a small set of parameters
for the task-specific updates of the model weights. Among PEFT methods, LoRA
stands out for its simplicity and efficiency, inspiring the development of a
series of variants. However, LoRA and its successors disregard the knowledge
that is noisy or irrelevant to the targeted task, detrimentally impacting model
performance and leading to suboptimality. To address this limitation, we
introduce Knowledge-aware Singular-value Adaptation (KaSA), a PEFT method that
leverages singular value decomposition (SVD) with knowledge-aware singular
values to dynamically activate knowledge based on its relevance to the task at
hand. We conduct extensive experiments across a range of LLMs on tasks spanning
natural language understanding (NLU), generation (NLG), instruction following,
and commonsense reasoning. The experimental results demonstrate that KaSA
consistently outperforms FFT and 14 popular PEFT baselines across 16 benchmarks
and 4 synthetic datasets, underscoring our method's efficacy and adaptability.
The source code of our method is available at
https://github.com/juyongjiang/KaSA.Summary
AI-Generated Summary