LLM模块:使用增强交叉注意力从大模型向小模型传递知识
LLM Modules: Knowledge Transfer from a Large to a Small Model using Enhanced Cross-Attention
February 12, 2025
作者: Konstantin Kolomeitsev
cs.AI
摘要
在这项工作中,我们提出了一种LLM模块的架构,通过增强型交叉注意机制实现了从大型预训练模型向较小模型的知识传输。在提出的方案中,Qwen2-1.5B模型被冻结,其表示经过特别设计的注意层传递到在有限计算资源上训练的GPT-Neo-125M模型。在Bespoke-Stratos-17k数据集上的实验结果表明,在经过15个时期的训练后,组合模型生成的响应质量与蒸馏获得的响应相当。我们讨论了模块化方法的优势,提供了输入查询和比较分析的示例,并概述了该方法进一步扩展的前景。
English
In this work, we propose an architecture of LLM Modules that enables the
transfer of knowledge from a large pre-trained model to a smaller model using
an Enhanced Cross-Attention mechanism. In the proposed scheme, the Qwen2-1.5B
model is frozen and its representations are passed through specially designed
attention layers to the GPT-Neo-125M model, which is trained on limited
computational resources. Experimental results on the Bespoke-Stratos-17k
dataset demonstrate that after 15 epochs of training, the combined model
generates responses comparable in quality to those obtained by distillation. We
discuss the advantages of the modular approach, provide examples of input
queries and comparative analysis, and outline prospects for further extension
of the method.Summary
AI-Generated Summary