Xmodel-2 技術報告
Xmodel-2 Technical Report
December 27, 2024
作者: Wang Qun, Liu Yang, Lin Qingquan, Qu Zhijiu, Jiang Ling
cs.AI
摘要
Xmodel-2 是一個擁有 12 億參數的大型語言模型,專為推理任務而設計。其架構使不同模型規模能共享一組統一的超參數,讓較小模型進行廣泛實驗並無縫地將最佳配置轉移到較大模型。為了最大化訓練效率和穩定性,Xmodel-2 使用了來自 MiniCPM 的 WSD 學習率調度器。在從多源預訓練的 1.5 兆標記中,Xmodel-2 在複雜推理和基於代理的任務中實現了最先進的性能,同時保持低訓練成本。這些結果突顯了高效模型設計和訓練策略在推進推理能力方面的潛力。模型檢查點和代碼可在 GitHub 上公開獲得,網址為 https://github.com/XiaoduoAILab/Xmodel-2
English
Xmodel-2 is a 1.2-billion-parameter large language model designed
specifically for reasoning tasks. Its architecture enables different model
scales to share a unified set of hyperparameters, allowing for extensive
experimentation on smaller models and seamless transfer of optimal
configurations to larger models. To maximize training efficiency and stability,
Xmodel-2 employs the WSD learning rate scheduler from MiniCPM. Pretrained on
1.5 trillion tokens from diverse sources, Xmodel-2 achieves state-of-the-art
performance in complex reasoning and agent-based tasks, while maintaining low
training costs. These results highlight the potential of efficient model design
and training strategies in advancing reasoning capabilities. Model checkpoints
and code are publicly available on GitHub at
https://github.com/XiaoduoAILab/Xmodel-2Summary
AI-Generated Summary