RandLoRA:大型模型的全秩参数高效微调
RandLoRA: Full-rank parameter-efficient fine-tuning of large models
February 3, 2025
作者: Paul Albert, Frederic Z. Zhang, Hemanth Saratchandran, Cristian Rodriguez-Opazo, Anton van den Hengel, Ehsan Abbasnejad
cs.AI
摘要
低秩自适应(Low-Rank Adaptation,LoRA)及其变体在减少大型Transformer网络的可训练参数和内存需求的同时保持微调性能方面展现出令人印象深刻的结果。然而,权重更新的低秩特性固有地限制了微调模型的表示能力,可能会影响在复杂任务上的性能。这引发了一个关键问题:当观察到LoRA与标准微调之间的性能差距时,是由于可训练参数数量减少还是秩缺乏导致的?本文旨在通过引入RandLoRA来回答这个问题,这是一种参数高效的方法,使用学习得到的低秩、不可训练随机矩阵的线性组合执行全秩更新。我们的方法通过将优化限制在应用于固定随机矩阵的对角缩放矩阵上,限制了可训练参数的数量,从而在训练过程中有效地克服了低秩限制,同时保持了参数和内存效率。通过在视觉、语言和视觉-语言基准上进行广泛实验,我们系统地评估了LoRA和现有随机基方法的限制。我们的研究结果表明,全秩更新在视觉和语言任务中分别是有益的,对于视觉-语言任务来说尤其如此,其中RandLoRA显著减少了标准微调和LoRA之间的性能差距,有时甚至消除了这一差距,展示了其有效性。
English
Low-Rank Adaptation (LoRA) and its variants have shown impressive results in
reducing the number of trainable parameters and memory requirements of large
transformer networks while maintaining fine-tuning performance. However, the
low-rank nature of the weight update inherently limits the representation power
of fine-tuned models, potentially compromising performance on complex tasks.
This raises a critical question: when a performance gap between LoRA and
standard fine-tuning is observed, is it due to the reduced number of trainable
parameters or the rank deficiency? This paper aims to answer this question by
introducing RandLoRA, a parameter-efficient method that performs full-rank
updates using a learned linear combinations of low-rank, non-trainable random
matrices. Our method limits the number of trainable parameters by restricting
optimization to diagonal scaling matrices applied to the fixed random matrices.
This allows us to effectively overcome the low-rank limitations while
maintaining parameter and memory efficiency during training. Through extensive
experimentation across vision, language, and vision-language benchmarks, we
systematically evaluate the limitations of LoRA and existing random basis
methods. Our findings reveal that full-rank updates are beneficial across
vision and language tasks individually, and even more so for vision-language
tasks, where RandLoRA significantly reduces -- and sometimes eliminates -- the
performance gap between standard fine-tuning and LoRA, demonstrating its
efficacy.Summary
AI-Generated Summary