Phi-4 技术报告
Phi-4 Technical Report
December 12, 2024
作者: Marah Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison, Russell J. Hewett, Mojan Javaheripi, Piero Kauffmann, James R. Lee, Yin Tat Lee, Yuanzhi Li, Weishung Liu, Caio C. T. Mendes, Anh Nguyen, Eric Price, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Xin Wang, Rachel Ward, Yue Wu, Dingli Yu, Cyril Zhang, Yi Zhang
cs.AI
摘要
我们介绍 phi-4,这是一个拥有 140 亿参数的语言模型,其训练配方主要关注数据质量。与大多数语言模型不同,其预训练主要基于诸如网络内容或代码等有机数据源的模型不同,phi-4 在整个训练过程中策略性地融入了合成数据。虽然 Phi 系列中以往的模型主要提炼了教师模型(具体来说是 GPT-4)的能力,但 phi-4 在面向 STEM 领域的问答能力上显著超越了其教师模型,这证明了我们的数据生成和后训练技术超越了简单提炼。尽管 phi-4 在架构上的改变很少,但由于改进的数据、训练课程以及后训练方案的创新,phi-4 相对于其规模取得了强大的性能,尤其是在侧重推理的基准测试上。
English
We present phi-4, a 14-billion parameter language model developed with a
training recipe that is centrally focused on data quality. Unlike most language
models, where pre-training is based primarily on organic data sources such as
web content or code, phi-4 strategically incorporates synthetic data throughout
the training process. While previous models in the Phi family largely distill
the capabilities of a teacher model (specifically GPT-4), phi-4 substantially
surpasses its teacher model on STEM-focused QA capabilities, giving evidence
that our data-generation and post-training techniques go beyond distillation.
Despite minimal changes to the phi-3 architecture, phi-4 achieves strong
performance relative to its size -- especially on reasoning-focused benchmarks
-- due to improved data, training curriculum, and innovations in the
post-training scheme.Summary
AI-Generated Summary