AdaMMS:基于无监督系数优化的异构多模态大语言模型融合
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization
March 31, 2025
作者: Yiyang Du, Xiaochen Wang, Chi Chen, Jiabo Ye, Yiru Wang, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Zhifang Sui, Maosong Sun, Yang Liu
cs.AI
摘要
近期,模型融合方法在整合多个大型语言模型(LLMs)于各类任务上的能力方面展现了显著优势。然而,以往的模型融合方法主要集中于架构相同的同质模型,在面对具有固有异质性的多模态大型语言模型(MLLMs)时遇到了挑战,这些挑战包括模型架构的差异以及参数空间的不对称性。本研究中,我们提出了AdaMMS,一种专为异质MLLMs设计的新型模型融合方法。该方法通过三个步骤应对挑战:映射、融合与搜索。具体而言,我们首先设计了模型间的映射函数,以便在不同架构的MLLMs上实施模型融合;随后,采用线性插值法对模型权重进行调整,主动适应异质MLLMs中的不对称性;最后,在超参数搜索阶段,提出了一种无监督的超参数选择方法用于模型融合。作为首个无需标注数据即可融合异质MLLMs的模型融合方法,大量实验证明,AdaMMS在多种模型组合上均优于先前的模型融合方法,在多个视觉-语言基准测试中表现卓越。
English
Recently, model merging methods have demonstrated powerful strengths in
combining abilities on various tasks from multiple Large Language Models
(LLMs). While previous model merging methods mainly focus on merging
homogeneous models with identical architecture, they meet challenges when
dealing with Multimodal Large Language Models (MLLMs) with inherent
heterogeneous property, including differences in model architecture and the
asymmetry in the parameter space. In this work, we propose AdaMMS, a novel
model merging method tailored for heterogeneous MLLMs. Our method tackles the
challenges in three steps: mapping, merging and searching. Specifically, we
first design mapping function between models to apply model merging on MLLMs
with different architecture. Then we apply linear interpolation on model
weights to actively adapt the asymmetry in the heterogeneous MLLMs. Finally in
the hyper-parameter searching step, we propose an unsupervised hyper-parameter
selection method for model merging. As the first model merging method capable
of merging heterogeneous MLLMs without labeled data, extensive experiments on
various model combinations demonstrated that AdaMMS outperforms previous model
merging methods on various vision-language benchmarks.Summary
AI-Generated Summary