KaSA：知識感知的大型語言模型奇異值適應

摘要

隨著大型語言模型（LLMs）的尺寸不斷增加，將這些模型適應特定任務或領域時會導致顯著的計算開銷和記憶體使用量。為了緩解這些挑戰，已經提出了各種參數高效微調（PEFT）方法，通過訓練一小組參數來進行模型權重的任務特定更新。在PEFT方法中，LoRA因其簡單和高效而脫穎而出，激發了一系列變體的發展。然而，LoRA及其後續版本忽略了與目標任務無關或無用的知識，對模型性能產生不利影響，導致次優性。為了解決這一限制，我們引入了知識感知奇異值調適（KaSA），這是一種PEFT方法，利用奇異值分解（SVD）和知識感知奇異值，根據其與當前任務的相關性動態激活知識。我們在跨越自然語言理解（NLU）、生成（NLG）、指令遵循和常識推理等任務的一系列LLMs上進行了廣泛實驗。實驗結果表明，KaSA在16個基準測試和4個合成數據集上始終優於FFT和其他14種流行的PEFT基準，突顯了我們方法的功效和適應性。我們方法的源代碼可在以下鏈接找到：https://github.com/juyongjiang/KaSA。

English

The increasing sizes of large language models (LLMs) result in significant computational overhead and memory usage when adapting these models to specific tasks or domains. Various parameter-efficient fine-tuning (PEFT) methods have been devised to mitigate these challenges by training a small set of parameters for the task-specific updates of the model weights. Among PEFT methods, LoRA stands out for its simplicity and efficiency, inspiring the development of a series of variants. However, LoRA and its successors disregard the knowledge that is noisy or irrelevant to the targeted task, detrimentally impacting model performance and leading to suboptimality. To address this limitation, we introduce Knowledge-aware Singular-value Adaptation (KaSA), a PEFT method that leverages singular value decomposition (SVD) with knowledge-aware singular values to dynamically activate knowledge based on its relevance to the task at hand. We conduct extensive experiments across a range of LLMs on tasks spanning natural language understanding (NLU), generation (NLG), instruction following, and commonsense reasoning. The experimental results demonstrate that KaSA consistently outperforms FFT and 14 popular PEFT baselines across 16 benchmarks and 4 synthetic datasets, underscoring our method's efficacy and adaptability. The source code of our method is available at https://github.com/juyongjiang/KaSA.

KaSA：知識感知的大型語言模型奇異值適應

KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models

摘要

Summary

Support