探索模型親緣性以合併大型語言模型

摘要

模型合併已成為增強大型語言模型（LLMs）能力和效率的關鍵技術之一。然而，我們對於合併任意兩個模型時預期性能提升和原則的理解仍然有限。在這項工作中，我們引入模型親緣性的概念，即LLMs之間的相似度或關聯程度，類似於生物進化。通過全面的實證分析，我們發現模型親緣性與模型合併後性能提升之間存在一定關係，這有助於指導我們選擇候選模型。受此啟發，我們提出了一種新的模型合併策略：具有模型親緣性的Top-k貪婪合併，可以在基準數據集上獲得更好的性能。具體來說，我們發現使用模型親緣性作為標準可以幫助我們持續進行模型合併，減輕模型進化中的退化（局部最優），而模型親緣性可以作為一種指導，幫助我們避開這些陷阱。代碼可在https://github.com/zjunlp/ModelKinship找到。

English

Model merging has become one of the key technologies for enhancing the capabilities and efficiency of Large Language Models (LLMs). However, our understanding of the expected performance gains and principles when merging any two models remains limited. In this work, we introduce model kinship, the degree of similarity or relatedness between LLMs, analogous to biological evolution. With comprehensive empirical analysis, we find that there is a certain relationship between model kinship and the performance gains after model merging, which can help guide our selection of candidate models. Inspired by this, we propose a new model merging strategy: Top-k Greedy Merging with Model Kinship, which can yield better performance on benchmark datasets. Specifically, we discover that using model kinship as a criterion can assist us in continuously performing model merging, alleviating the degradation (local optima) in model evolution, whereas model kinship can serve as a guide to escape these traps. Code is available at https://github.com/zjunlp/ModelKinship.

探索模型親緣性以合併大型語言模型

Exploring Model Kinship for Merging Large Language Models

摘要

Summary

Support

Support