ChatPaper.aiChatPaper

TinyR1-32B预览版:通过分支合并蒸馏提升精度

TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation

March 6, 2025
作者: Lin Sun, Guangxiang Zhao, Xiaoqi Jian, Yuhan Wu, Weihong Lin, Yongfu Zhu, Change Jia, Linglin Zhang, Jinzhu Wu, Junfeng Ran, Sai-er Hu, Zihan Jiang, Junting Zhou, Wenrui Liu, Bin Cui, Tong Yang, Xiangzheng Zhang
cs.AI

摘要

在保持性能的同时缩小大型语言模型(LLMs)的规模已成为一个备受关注的挑战。然而,现有方法如模型蒸馏和迁移学习往往难以实现高精度。为解决这一局限,我们引入了分支-合并蒸馏方法,该方法通过两个阶段增强模型压缩:(1)分支阶段,通过领域特定的监督微调(SFT),将大型教师模型的知识有选择地蒸馏到专门的学生模型中;(2)合并阶段,将这些学生模型合并,以实现跨领域知识转移并提升泛化能力。我们以DeepSeek-R1作为教师模型,DeepSeek-R1-Distill-Qwen-32B作为学生模型,验证了我们的蒸馏方法。最终合并的模型TinyR1-32B-Preview在多个基准测试中均优于其对应模型DeepSeek-R1-Distill-Qwen-32B,包括数学(+5.5分)、编程(+4.4分)和科学(+2.9分),同时在AIME 2024上实现了与DeepSeek-R1近乎相当的性能。分支-合并蒸馏方法为创建计算成本和时间更少、性能更优的小型LLMs提供了一种可扩展的解决方案。
English
The challenge of reducing the size of Large Language Models (LLMs) while maintaining their performance has gained significant attention. However, existing methods, such as model distillation and transfer learning, often fail to achieve high accuracy. To address this limitation, we introduce the Branch-Merge distillation approach, which enhances model compression through two phases: (1) the Branch Phase, where knowledge from a large teacher model is selectively distilled into specialized student models via domain-specific supervised fine-tuning (SFT); And (2) the Merge Phase, where these student models are merged to enable cross-domain knowledge transfer and improve generalization. We validate our distillation approach using DeepSeek-R1 as the teacher and DeepSeek-R1-Distill-Qwen-32B as the student. The resulting merged model, TinyR1-32B-Preview, outperforms its counterpart DeepSeek-R1-Distill-Qwen-32B across multiple benchmarks, including Mathematics (+5.5 points), Coding (+4.4 points) and Science (+2.9 points), while achieving near-equal performance to DeepSeek-R1 on AIME 2024. The Branch-Merge distillation approach provides a scalable solution for creating smaller, high-performing LLMs with reduced computational cost and time.

Summary

AI-Generated Summary

PDF142March 10, 2025