ChatPaper.aiChatPaper

MH-MoE:多头专家混合模型

MH-MoE:Multi-Head Mixture-of-Experts

November 25, 2024
作者: Shaohan Huang, Xun Wu, Shuming Ma, Furu Wei
cs.AI

摘要

多头专家混合(MH-MoE)通过使用多头机制共同关注来自不同专家的各种表示空间中的信息,展现出卓越的性能。在本文中,我们提出了一种新颖的MH-MoE实现,该实现既保持了与稀疏专家混合模型相同的FLOPs和参数对等性。对语言模型的实验结果显示,新实现相对于普通MoE和细粒度MoE模型都取得了质量改进。此外,我们的实验证明MH-MoE与1比特大型语言模型(LLMs)如BitNet兼容。
English
Multi-Head Mixture-of-Experts (MH-MoE) demonstrates superior performance by using the multi-head mechanism to collectively attend to information from various representation spaces within different experts. In this paper, we present a novel implementation of MH-MoE that maintains both FLOPs and parameter parity with sparse Mixture of Experts models. Experimental results on language models show that the new implementation yields quality improvements over both vanilla MoE and fine-grained MoE models. Additionally, our experiments demonstrate that MH-MoE is compatible with 1-bit Large Language Models (LLMs) such as BitNet.

Summary

AI-Generated Summary

PDF284November 26, 2024