ChatPaper.aiChatPaper

不留下任何任务:使用共同和特定任务子空间的各向同性模型合并

No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces

February 7, 2025
作者: Daniel Marczak, Simone Magistri, Sebastian Cygert, Bartłomiej Twardowski, Andrew D. Bagdanov, Joost van de Weijer
cs.AI

摘要

模型合并将多个特定任务模型的权重整合到一个多任务模型中。尽管最近对这个问题产生了兴趣,但合并模型和单任务模型之间仍存在显著的性能差距。在本文中,我们研究了任务矩阵的关键特征——应用于预训练模型的权重更新矩阵,这些特征有助于有效地进行合并。我们展示了任务特定和合并矩阵的奇异分量之间的对齐与性能改进与预训练模型之间的强相关性。基于此,我们提出了一种各向同性合并框架,该框架可以拉平任务矩阵的奇异值谱,增强对齐,并减小性能差距。此外,我们还结合了通用和任务特定子空间,以进一步提高对齐和性能。我们提出的方法在多种情景下实现了最先进的性能,包括不同任务集和模型规模。这项工作推动了对模型合并动态的理解,提供了一种有效的方法来合并模型,而无需额外的训练。代码可在 https://github.com/danielm1405/iso-merging 找到。
English
Model merging integrates the weights of multiple task-specific models into a single multi-task model. Despite recent interest in the problem, a significant performance gap between the combined and single-task models remains. In this paper, we investigate the key characteristics of task matrices -- weight update matrices applied to a pre-trained model -- that enable effective merging. We show that alignment between singular components of task-specific and merged matrices strongly correlates with performance improvement over the pre-trained model. Based on this, we propose an isotropic merging framework that flattens the singular value spectrum of task matrices, enhances alignment, and reduces the performance gap. Additionally, we incorporate both common and task-specific subspaces to further improve alignment and performance. Our proposed approach achieves state-of-the-art performance across multiple scenarios, including various sets of tasks and model scales. This work advances the understanding of model merging dynamics, offering an effective methodology to merge models without requiring additional training. Code is available at https://github.com/danielm1405/iso-merging .

Summary

AI-Generated Summary

PDF112February 10, 2025