Xmodel-2 技术报告

Xmodel-2 Technical Report

December 27, 2024
作者: Wang Qun, Liu Yang, Lin Qingquan, Qu Zhijiu, Jiang Ling
cs.AI

摘要

Xmodel-2是一个拥有12亿参数的大型语言模型,专为推理任务而设计。其架构使不同模型规模能共享一组统一的超参数,从而允许对较小模型进行广泛实验,并将最佳配置无缝转移到较大模型。为了最大化训练效率和稳定性,Xmodel-2采用了MiniCPM的WSD学习率调度器。在来自多个来源的1.5万亿令牌上进行预训练,Xmodel-2在复杂推理和基于代理的任务中实现了最先进的性能,同时保持低训练成本。这些结果突显了高效模型设计和训练策略在推进推理能力方面的潜力。模型检查点和代码可在GitHub上公开获取:https://github.com/XiaoduoAILab/Xmodel-2
English
Xmodel-2 is a 1.2-billion-parameter large language model designed specifically for reasoning tasks. Its architecture enables different model scales to share a unified set of hyperparameters, allowing for extensive experimentation on smaller models and seamless transfer of optimal configurations to larger models. To maximize training efficiency and stability, Xmodel-2 employs the WSD learning rate scheduler from MiniCPM. Pretrained on 1.5 trillion tokens from diverse sources, Xmodel-2 achieves state-of-the-art performance in complex reasoning and agent-based tasks, while maintaining low training costs. These results highlight the potential of efficient model design and training strategies in advancing reasoning capabilities. Model checkpoints and code are publicly available on GitHub at https://github.com/XiaoduoAILab/Xmodel-2

Summary

AI-Generated Summary

PDF254January 2, 2025