在事後訓練的大規模模型中,對 Delta 參數編輯的統一觀點
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models
October 17, 2024
作者: Qiaoyu Tang, Le Yu, Bowen Yu, Hongyu Lin, Keming Lu, Yaojie Lu, Xianpei Han, Le Sun
cs.AI
摘要
事後訓練已成為調整大型預訓練模型以適應各種任務的重要範式,其效果完全由增量參數(即事後訓練和預訓練參數之間的差異)所反映。儘管許多研究已通過剪枝、量化、低秩近似和外推等操作探索了增量參數的特性,但尚缺乏一個系統性地檢驗這些特性的統一框架。本文提出了一種基於損失函數的黎曼和逼近的新觀點,以闡明增量參數編輯操作。我們的分析將現有方法分為三類,根據它們的後編輯性能:競爭性、降低性和改進性,解釋它們如何通過黎曼和逼近項表達以及如何改變模型性能。對視覺和語言模型(包括ViT、LLaMA 3、Qwen 2和Mistral)進行了大量實驗,證實了我們的理論發現。此外,我們對現有技術進行了擴展,如DARE和BitDelta,突顯了它們在利用增量參數的特性和重新組織成通用表達式以增強事後訓練模型中增量參數編輯的適用性和有效性方面的局限性。
English
Post-training has emerged as a crucial paradigm for adapting large-scale
pre-trained models to various tasks, whose effects are fully reflected by delta
parameters (i.e., the disparity between post-trained and pre-trained
parameters). While numerous studies have explored delta parameter properties
via operations like pruning, quantization, low-rank approximation, and
extrapolation, a unified framework for systematically examining these
characteristics has been lacking. In this paper, we propose a novel perspective
based on Riemann sum approximation of the loss function to elucidate delta
parameter editing operations. Our analysis categorizes existing methods into
three classes based on their post-editing performance: competitive, decreased,
and improved, explaining how they are expressed by the Riemann sum
approximation term and how they alter the model performance. Extensive
experiments on both visual and language models, including ViT, LLaMA 3, Qwen 2,
and Mistral, corroborate our theoretical findings. Furthermore, we introduce
extensions to existing techniques like DARE and BitDelta, highlighting their
limitations in leveraging the properties of delta parameters and reorganizing
them into general expressions to enhance the applicability and effectiveness of
delta parameter editing in post-trained models.Summary
AI-Generated Summary