SEAP:无需训练的稀疏专家激活剪枝——释放大语言模型的智慧潜能
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models
March 10, 2025
作者: Xun Liang, Hanyu Wang, Huayi Lai, Simin Niu, Shichao Song, Jiawei Yang, Jihao Zhao, Feiyu Xiong, Bo Tang, Zhiyu Li
cs.AI
摘要
大型语言模型在各类自然语言处理任务中取得了显著成功,但其推理阶段的高计算成本仍是主要瓶颈。本文提出了一种无需训练的剪枝方法——稀疏专家激活剪枝(SEAP),该方法通过选择性保留任务相关参数来降低推理开销。受大型语言模型中隐藏状态和激活聚类模式的启发,SEAP识别出任务特定的专家激活模式,并在保持任务性能的同时剪枝模型,从而提升计算效率。实验结果表明,SEAP在显著减少计算开销的同时,保持了竞争力的准确率。值得注意的是,在50%的剪枝率下,SEAP相比WandA和FLAP提升了超过20%;而在20%的剪枝率下,与密集模型相比仅带来2.2%的性能下降。这些发现凸显了SEAP的可扩展性和有效性,使其成为优化大规模语言模型的一种有前景的方法。
English
Large Language Models have achieved remarkable success across various natural
language processing tasks, yet their high computational cost during inference
remains a major bottleneck. This paper introduces Sparse Expert Activation
Pruning (SEAP), a training-free pruning method that selectively retains
task-relevant parameters to reduce inference overhead. Inspired by the
clustering patterns of hidden states and activations in LLMs, SEAP identifies
task-specific expert activation patterns and prunes the model while preserving
task performance and enhancing computational efficiency. Experimental results
demonstrate that SEAP significantly reduces computational overhead while
maintaining competitive accuracy. Notably, at 50% pruning, SEAP surpasses both
WandA and FLAP by over 20%, and at 20% pruning, it incurs only a 2.2%
performance drop compared to the dense model. These findings highlight SEAP's
scalability and effectiveness, making it a promising approach for optimizing
large-scale LLMs.Summary
AI-Generated Summary