DistiLLM-2:对比学习法提升大语言模型蒸馏效率
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs
March 10, 2025
作者: Jongwoo Ko, Tianyi Chen, Sungnyun Kim, Tianyu Ding, Luming Liang, Ilya Zharkov, Se-Young Yun
cs.AI
摘要
尽管蒸馏技术在大语言模型(LLMs)中取得了成功,但大多数先前的研究对教师模型和学生模型生成的数据采用了相同的损失函数。这些策略忽视了损失函数与数据类型之间的协同作用,导致学生模型的性能提升不够理想。为解决这一问题,我们提出了DistiLLM-2,一种对比方法,通过利用这种协同作用,同时增加教师模型响应的可能性并降低学生模型响应的可能性。我们的大量实验表明,DistiLLM-2不仅在指令跟随和代码生成等广泛任务中构建了高性能的学生模型,还支持偏好对齐和视觉语言扩展等多样化应用。这些发现凸显了对比方法在通过有效对齐教师和学生模型于不同数据类型上,提升LLM蒸馏效能的潜力。
English
Despite the success of distillation in large language models (LLMs), most
prior work applies identical loss functions to both teacher- and
student-generated data. These strategies overlook the synergy between loss
formulations and data types, leading to a suboptimal performance boost in
student models. To address this, we propose DistiLLM-2, a contrastive approach
that simultaneously increases the likelihood of teacher responses and decreases
that of student responses by harnessing this synergy. Our extensive experiments
show that DistiLLM-2 not only builds high-performing student models across a
wide range of tasks, including instruction-following and code generation, but
also supports diverse applications, such as preference alignment and
vision-language extensions. These findings highlight the potential of a
contrastive approach to enhance the efficacy of LLM distillation by effectively
aligning teacher and student models across varied data types.Summary
AI-Generated Summary