ChatPaper.aiChatPaper

DeepMath-103K:一个大规模、具挑战性、去污染且可验证的数学数据集,用于推进推理研究

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

April 15, 2025
作者: Zhiwei He, Tian Liang, Jiahao Xu, Qiuzhi Liu, Xingyu Chen, Yue Wang, Linfeng Song, Dian Yu, Zhenwen Liang, Wenxuan Wang, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu
cs.AI

摘要

复杂数学推理能力是衡量人工智能水平的关键指标。尽管将强化学习(RL)应用于大语言模型(LLMs)展现出潜力,但进展却因缺乏大规模、具有足够挑战性、具备适合RL的可验证答案格式且未与评估基准混杂的训练数据而受到显著阻碍。为应对这些局限,我们推出了DeepMath-103K,一个包含约103,000道数学题的新大规模数据集,专为通过RL训练高级推理模型而设计。DeepMath-103K通过严格的流程精心构建,包括来源分析、针对多项基准的严格去污处理以及筛选高难度题目(主要为5至9级),其挑战性远超现有公开资源。每道题目均包含一个可验证的最终答案,支持基于规则的RL,以及三种由R1生成的解决方案,适用于监督微调或蒸馏等多种训练范式。DeepMath-103K涵盖广泛的数学主题,促进了可泛化推理能力的发展。我们证明,基于DeepMath-103K训练的模型在具有挑战性的数学基准测试中取得了显著提升,验证了其有效性。我们公开发布DeepMath-103K,以推动社区在构建更强大AI推理系统方面的进展:https://github.com/zwhe99/DeepMath。
English
The capacity for complex mathematical reasoning is a key benchmark for artificial intelligence. While reinforcement learning (RL) applied to LLMs shows promise, progress is significantly hindered by the lack of large-scale training data that is sufficiently challenging, possesses verifiable answer formats suitable for RL, and is free from contamination with evaluation benchmarks. To address these limitations, we introduce DeepMath-103K, a new, large-scale dataset comprising approximately 103K mathematical problems, specifically designed to train advanced reasoning models via RL. DeepMath-103K is curated through a rigorous pipeline involving source analysis, stringent decontamination against numerous benchmarks, and filtering for high difficulty (primarily Levels 5-9), significantly exceeding existing open resources in challenge. Each problem includes a verifiable final answer, enabling rule-based RL, and three distinct R1-generated solutions suitable for diverse training paradigms like supervised fine-tuning or distillation. Spanning a wide range of mathematical topics, DeepMath-103K promotes the development of generalizable reasoning. We demonstrate that models trained on DeepMath-103K achieve significant improvements on challenging mathematical benchmarks, validating its effectiveness. We release DeepMath-103K publicly to facilitate community progress in building more capable AI reasoning systems: https://github.com/zwhe99/DeepMath.

Summary

AI-Generated Summary

PDF116April 16, 2025