Phi-4 技術報告
Phi-4 Technical Report
December 12, 2024
作者: Marah Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison, Russell J. Hewett, Mojan Javaheripi, Piero Kauffmann, James R. Lee, Yin Tat Lee, Yuanzhi Li, Weishung Liu, Caio C. T. Mendes, Anh Nguyen, Eric Price, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Xin Wang, Rachel Ward, Yue Wu, Dingli Yu, Cyril Zhang, Yi Zhang
cs.AI
摘要
我們提出 phi-4,一個擁有 140 億參數的語言模型,其訓練配方主要著重於數據質量。與大多數語言模型不同,其預訓練主要基於有機數據來源,如網絡內容或代碼,phi-4 策略性地在整個訓練過程中納入合成數據。儘管 Phi 系列中先前的模型主要提煉了教師模型(特別是 GPT-4)的能力,phi-4 在 STEM 專注的問答能力上顯著超越其教師模型,這證明我們的數據生成和後訓練技術超越了提煉。儘管對 phi-3 架構進行了最小的更改,phi-4 由於數據、訓練課程的改進以及後訓練方案的創新,在相對於其大小的情況下實現了強大的性能,特別是在著眼於推理的基準測試上。
English
We present phi-4, a 14-billion parameter language model developed with a
training recipe that is centrally focused on data quality. Unlike most language
models, where pre-training is based primarily on organic data sources such as
web content or code, phi-4 strategically incorporates synthetic data throughout
the training process. While previous models in the Phi family largely distill
the capabilities of a teacher model (specifically GPT-4), phi-4 substantially
surpasses its teacher model on STEM-focused QA capabilities, giving evidence
that our data-generation and post-training techniques go beyond distillation.
Despite minimal changes to the phi-3 architecture, phi-4 achieves strong
performance relative to its size -- especially on reasoning-focused benchmarks
-- due to improved data, training curriculum, and innovations in the
post-training scheme.Summary
AI-Generated Summary