DeepMath-103K:一個大規模、具挑戰性、去污染且可驗證的數學數據集,用於推進推理能力
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
April 15, 2025
作者: Zhiwei He, Tian Liang, Jiahao Xu, Qiuzhi Liu, Xingyu Chen, Yue Wang, Linfeng Song, Dian Yu, Zhenwen Liang, Wenxuan Wang, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu
cs.AI
摘要
複雜數學推理能力是衡量人工智慧的關鍵指標。儘管將強化學習(RL)應用於大型語言模型(LLMs)展現出潛力,但由於缺乏大規模、具有足夠挑戰性、具備適合RL的可驗證答案格式且未與評估基準污染的訓練數據,進展受到顯著阻礙。為解決這些限制,我們引入了DeepMath-103K,這是一個包含約103,000個數學問題的新大規模數據集,專門設計用於通過RL訓練高級推理模型。DeepMath-103K通過嚴格的流程進行策劃,包括來源分析、針對多個基準的嚴格去污染以及篩選高難度問題(主要為5-9級),其挑戰性顯著超越現有的開放資源。每個問題都包含一個可驗證的最終答案,支持基於規則的RL,以及三個由R1生成的不同解決方案,適用於多樣化的訓練範式,如監督微調或蒸餾。DeepMath-103K涵蓋廣泛的數學主題,促進可泛化推理能力的發展。我們證明,在DeepMath-103K上訓練的模型在具有挑戰性的數學基準上取得了顯著改進,驗證了其有效性。我們公開釋出DeepMath-103K,以促進社區在構建更強大AI推理系統方面的進展:https://github.com/zwhe99/DeepMath。
English
The capacity for complex mathematical reasoning is a key benchmark for
artificial intelligence. While reinforcement learning (RL) applied to LLMs
shows promise, progress is significantly hindered by the lack of large-scale
training data that is sufficiently challenging, possesses verifiable answer
formats suitable for RL, and is free from contamination with evaluation
benchmarks. To address these limitations, we introduce DeepMath-103K, a new,
large-scale dataset comprising approximately 103K mathematical problems,
specifically designed to train advanced reasoning models via RL. DeepMath-103K
is curated through a rigorous pipeline involving source analysis, stringent
decontamination against numerous benchmarks, and filtering for high difficulty
(primarily Levels 5-9), significantly exceeding existing open resources in
challenge. Each problem includes a verifiable final answer, enabling rule-based
RL, and three distinct R1-generated solutions suitable for diverse training
paradigms like supervised fine-tuning or distillation. Spanning a wide range of
mathematical topics, DeepMath-103K promotes the development of generalizable
reasoning. We demonstrate that models trained on DeepMath-103K achieve
significant improvements on challenging mathematical benchmarks, validating its
effectiveness. We release DeepMath-103K publicly to facilitate community
progress in building more capable AI reasoning systems:
https://github.com/zwhe99/DeepMath.Summary
AI-Generated Summary