ChatPaper.aiChatPaper

小型模型难以从强大推理器中有效学习。

Small Models Struggle to Learn from Strong Reasoners

February 17, 2025
作者: Yuetai Li, Xiang Yue, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Bhaskar Ramasubramanian, Radha Poovendran
cs.AI

摘要

大型语言模型(LLMs)在复杂推理任务中表现出色,将其推理能力提炼至较小模型已展现出潜力。然而,我们发现了一个有趣的现象,称之为“小模型可学习性差距”:参数规模较小(≤30亿)的模型并未从长链思维(CoT)推理或大模型蒸馏中持续获益。相反,当它们针对更短、更简单的推理链进行微调时,表现更佳,这些推理链更契合其内在学习能力。为此,我们提出了混合蒸馏(Mix Distillation),一种简单而有效的策略,通过结合长短CoT示例或大小模型的推理,来平衡推理复杂度。实验表明,相较于单独使用任一数据训练,混合蒸馏显著提升了小模型的推理性能。这些发现揭示了直接强模型蒸馏的局限性,并强调了调整推理复杂度对于有效推理能力迁移的重要性。
English
Large language models (LLMs) excel in complex reasoning tasks, and distilling their reasoning capabilities into smaller models has shown promise. However, we uncover an interesting phenomenon, which we term the Small Model Learnability Gap: small models (leq3B parameters) do not consistently benefit from long chain-of-thought (CoT) reasoning or distillation from larger models. Instead, they perform better when fine-tuned on shorter, simpler reasoning chains that better align with their intrinsic learning capacity. To address this, we propose Mix Distillation, a simple yet effective strategy that balances reasoning complexity by combining long and short CoT examples or reasoning from both larger and smaller models. Our experiments demonstrate that Mix Distillation significantly improves small model reasoning performance compared to training on either data alone. These findings highlight the limitations of direct strong model distillation and underscore the importance of adapting reasoning complexity for effective reasoning capability transfer.

Summary

AI-Generated Summary

PDF296February 20, 2025