大型语言模型的激活感知合并
Activation-Informed Merging of Large Language Models
February 4, 2025
作者: Amin Heyrani Nobari, Kaveh Alimohammadi, Ali ArjomandBigdeli, Akash Srivastava, Faez Ahmed, Navid Azizan
cs.AI
摘要
模型合并是一种方法,它将多个经过微调的大型语言模型(LLMs)的参数和嵌入结合起来,为提高模型在各种任务中的性能同时保持计算效率提供了一种有前途的途径。本文介绍了激活信息合并(AIM),这是一种技术,它将LLMs的激活空间中的信息整合到合并过程中,以提高性能和鲁棒性。AIM被设计为一种灵活的、补充性的解决方案,适用于任何现有的合并方法。它旨在保留基础模型中的关键权重,借鉴了持续学习(CL)和模型压缩的原则。利用与任务无关的校准集,AIM在合并过程中有选择地优先考虑关键权重。我们通过实验证明,AIM显著提升了合并模型在多个基准测试中的性能。我们的研究结果表明,考虑激活空间信息可以在LLMs的模型合并策略中取得重大进展,基准性能提高了高达40%。
English
Model merging, a method that combines the parameters and embeddings of
multiple fine-tuned large language models (LLMs), offers a promising approach
to enhance model performance across various tasks while maintaining
computational efficiency. This paper introduces Activation-Informed Merging
(AIM), a technique that integrates the information from the activation space of
LLMs into the merging process to improve performance and robustness. AIM is
designed as a flexible, complementary solution that is applicable to any
existing merging method. It aims to preserve critical weights from the base
model, drawing on principles from continual learning~(CL) and model
compression. Utilizing a task-agnostic calibration set, AIM selectively
prioritizes essential weights during merging. We empirically demonstrate that
AIM significantly enhances the performance of merged models across multiple
benchmarks. Our findings suggest that considering the activation-space
information can provide substantial advancements in the model merging
strategies for LLMs with up to 40\% increase in benchmark performance.Summary
AI-Generated Summary