探索模型親緣性以合併大型語言模型
Exploring Model Kinship for Merging Large Language Models
October 16, 2024
作者: Yedi Hu, Yunzhi Yao, Ningyu Zhang, Shumin Deng, Huajun Chen
cs.AI
摘要
模型合併已成為增強大型語言模型(LLMs)能力和效率的關鍵技術之一。然而,我們對於合併任意兩個模型時預期性能提升和原則的理解仍然有限。在這項工作中,我們引入模型親緣性的概念,即LLMs之間的相似度或關聯程度,類似於生物進化。通過全面的實證分析,我們發現模型親緣性與模型合併後性能提升之間存在一定關係,這有助於指導我們選擇候選模型。受此啟發,我們提出了一種新的模型合併策略:具有模型親緣性的Top-k貪婪合併,可以在基準數據集上獲得更好的性能。具體來說,我們發現使用模型親緣性作為標準可以幫助我們持續進行模型合併,減輕模型進化中的退化(局部最優),而模型親緣性可以作為一種指導,幫助我們避開這些陷阱。代碼可在https://github.com/zjunlp/ModelKinship找到。
English
Model merging has become one of the key technologies for enhancing the
capabilities and efficiency of Large Language Models (LLMs). However, our
understanding of the expected performance gains and principles when merging any
two models remains limited. In this work, we introduce model kinship, the
degree of similarity or relatedness between LLMs, analogous to biological
evolution. With comprehensive empirical analysis, we find that there is a
certain relationship between model kinship and the performance gains after
model merging, which can help guide our selection of candidate models. Inspired
by this, we propose a new model merging strategy: Top-k Greedy Merging with
Model Kinship, which can yield better performance on benchmark datasets.
Specifically, we discover that using model kinship as a criterion can assist us
in continuously performing model merging, alleviating the degradation (local
optima) in model evolution, whereas model kinship can serve as a guide to
escape these traps. Code is available at
https://github.com/zjunlp/ModelKinship.Summary
AI-Generated Summary