ChatPaper.aiChatPaper

大型语言模型的激活感知合并

Activation-Informed Merging of Large Language Models

February 4, 2025
作者: Amin Heyrani Nobari, Kaveh Alimohammadi, Ali ArjomandBigdeli, Akash Srivastava, Faez Ahmed, Navid Azizan
cs.AI

摘要

模型合并是一种方法,它将多个经过微调的大型语言模型(LLMs)的参数和嵌入结合起来,为提高模型在各种任务中的性能同时保持计算效率提供了一种有前途的途径。本文介绍了激活信息合并(AIM),这是一种技术,它将LLMs的激活空间中的信息整合到合并过程中,以提高性能和鲁棒性。AIM被设计为一种灵活的、补充性的解决方案,适用于任何现有的合并方法。它旨在保留基础模型中的关键权重,借鉴了持续学习(CL)和模型压缩的原则。利用与任务无关的校准集,AIM在合并过程中有选择地优先考虑关键权重。我们通过实验证明,AIM显著提升了合并模型在多个基准测试中的性能。我们的研究结果表明,考虑激活空间信息可以在LLMs的模型合并策略中取得重大进展,基准性能提高了高达40%。
English
Model merging, a method that combines the parameters and embeddings of multiple fine-tuned large language models (LLMs), offers a promising approach to enhance model performance across various tasks while maintaining computational efficiency. This paper introduces Activation-Informed Merging (AIM), a technique that integrates the information from the activation space of LLMs into the merging process to improve performance and robustness. AIM is designed as a flexible, complementary solution that is applicable to any existing merging method. It aims to preserve critical weights from the base model, drawing on principles from continual learning~(CL) and model compression. Utilizing a task-agnostic calibration set, AIM selectively prioritizes essential weights during merging. We empirically demonstrate that AIM significantly enhances the performance of merged models across multiple benchmarks. Our findings suggest that considering the activation-space information can provide substantial advancements in the model merging strategies for LLMs with up to 40\% increase in benchmark performance.

Summary

AI-Generated Summary

PDF52February 6, 2025