DeepMath-103K：一個大規模、具挑戰性、去污染且可驗證的數學數據集，用於推進推理能力

摘要

複雜數學推理能力是衡量人工智慧的關鍵指標。儘管將強化學習（RL）應用於大型語言模型（LLMs）展現出潛力，但由於缺乏大規模、具有足夠挑戰性、具備適合RL的可驗證答案格式且未與評估基準污染的訓練數據，進展受到顯著阻礙。為解決這些限制，我們引入了DeepMath-103K，這是一個包含約103,000個數學問題的新大規模數據集，專門設計用於通過RL訓練高級推理模型。DeepMath-103K通過嚴格的流程進行策劃，包括來源分析、針對多個基準的嚴格去污染以及篩選高難度問題（主要為5-9級），其挑戰性顯著超越現有的開放資源。每個問題都包含一個可驗證的最終答案，支持基於規則的RL，以及三個由R1生成的不同解決方案，適用於多樣化的訓練範式，如監督微調或蒸餾。DeepMath-103K涵蓋廣泛的數學主題，促進可泛化推理能力的發展。我們證明，在DeepMath-103K上訓練的模型在具有挑戰性的數學基準上取得了顯著改進，驗證了其有效性。我們公開釋出DeepMath-103K，以促進社區在構建更強大AI推理系統方面的進展：https://github.com/zwhe99/DeepMath。

English

The capacity for complex mathematical reasoning is a key benchmark for artificial intelligence. While reinforcement learning (RL) applied to LLMs shows promise, progress is significantly hindered by the lack of large-scale training data that is sufficiently challenging, possesses verifiable answer formats suitable for RL, and is free from contamination with evaluation benchmarks. To address these limitations, we introduce DeepMath-103K, a new, large-scale dataset comprising approximately 103K mathematical problems, specifically designed to train advanced reasoning models via RL. DeepMath-103K is curated through a rigorous pipeline involving source analysis, stringent decontamination against numerous benchmarks, and filtering for high difficulty (primarily Levels 5-9), significantly exceeding existing open resources in challenge. Each problem includes a verifiable final answer, enabling rule-based RL, and three distinct R1-generated solutions suitable for diverse training paradigms like supervised fine-tuning or distillation. Spanning a wide range of mathematical topics, DeepMath-103K promotes the development of generalizable reasoning. We demonstrate that models trained on DeepMath-103K achieve significant improvements on challenging mathematical benchmarks, validating its effectiveness. We release DeepMath-103K publicly to facilitate community progress in building more capable AI reasoning systems: https://github.com/zwhe99/DeepMath.

DeepMath-103K：一個大規模、具挑戰性、去污染且可驗證的數學數據集，用於推進推理能力

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

摘要

Summary

Support

Support