混元-大型：由腾讯开发的拥有520亿激活参数的开源MoE模型

摘要

本文介绍了Hunyuan-Large，目前是最大的开源基于Transformer的专家混合模型，共有3890亿参数和520亿激活参数，能够处理高达256K个标记。我们对Hunyuan-Large在各种基准测试中的卓越性能进行了彻底评估，包括语言理解和生成、逻辑推理、数学问题求解、编码、长文本和聚合任务，在这些任务中，它优于LLama3.1-70B，并在与规模显著更大的LLama3.1-405B模型的比较中表现出可比较的性能。Hunyuan-Large的关键实践包括比先前文献中大得多的大规模合成数据、混合专家路由策略、键-值缓存压缩技术和专家特定的学习率策略。此外，我们还研究了专家混合模型的扩展规律和学习率调度，为未来模型的开发和优化提供了宝贵的见解和指导。Hunyuan-Large的代码和检查点已发布，以促进未来的创新和应用。代码：https://github.com/Tencent/Hunyuan-Large 模型：https://huggingface.co/tencent/Tencent-Hunyuan-Large

English

In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. We conduct a thorough evaluation of Hunyuan-Large's superior performance across various benchmarks including language understanding and generation, logical reasoning, mathematical problem-solving, coding, long-context, and aggregated tasks, where it outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model. Key practice of Hunyuan-Large include large-scale synthetic data that is orders larger than in previous literature, a mixed expert routing strategy, a key-value cache compression technique, and an expert-specific learning rate strategy. Additionally, we also investigate the scaling laws and learning rate schedule of mixture of experts models, providing valuable insights and guidances for future model development and optimization. The code and checkpoints of Hunyuan-Large are released to facilitate future innovations and applications. Codes: https://github.com/Tencent/Hunyuan-Large Models: https://huggingface.co/tencent/Tencent-Hunyuan-Large

混元-大型：由腾讯开发的拥有520亿激活参数的开源MoE模型

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

摘要

Summary

Support

Support