Xmodel-2 技術報告

Xmodel-2 Technical Report

December 27, 2024
作者: Wang Qun, Liu Yang, Lin Qingquan, Qu Zhijiu, Jiang Ling
cs.AI

摘要

Xmodel-2 是一個擁有 12 億參數的大型語言模型,專為推理任務而設計。其架構使不同模型規模能共享一組統一的超參數,讓較小模型進行廣泛實驗並無縫地將最佳配置轉移到較大模型。為了最大化訓練效率和穩定性,Xmodel-2 使用了來自 MiniCPM 的 WSD 學習率調度器。在從多源預訓練的 1.5 兆標記中,Xmodel-2 在複雜推理和基於代理的任務中實現了最先進的性能,同時保持低訓練成本。這些結果突顯了高效模型設計和訓練策略在推進推理能力方面的潛力。模型檢查點和代碼可在 GitHub 上公開獲得,網址為 https://github.com/XiaoduoAILab/Xmodel-2
English
Xmodel-2 is a 1.2-billion-parameter large language model designed specifically for reasoning tasks. Its architecture enables different model scales to share a unified set of hyperparameters, allowing for extensive experimentation on smaller models and seamless transfer of optimal configurations to larger models. To maximize training efficiency and stability, Xmodel-2 employs the WSD learning rate scheduler from MiniCPM. Pretrained on 1.5 trillion tokens from diverse sources, Xmodel-2 achieves state-of-the-art performance in complex reasoning and agent-based tasks, while maintaining low training costs. These results highlight the potential of efficient model design and training strategies in advancing reasoning capabilities. Model checkpoints and code are publicly available on GitHub at https://github.com/XiaoduoAILab/Xmodel-2

Summary

AI-Generated Summary

PDF254January 2, 2025