ChatPaper.aiChatPaper

从大语言模型自适应问题难度分级的视角重新思考高质量思维链数据的生成

Rethinking the Generation of High-Quality CoT Data from the Perspective of LLM-Adaptive Question Difficulty Grading

April 16, 2025
作者: Qianjin Yu, Keyu Wu, Zihan Chen, Chushu Zhang, Manlin Mei, Lingjun Huang, Fang Tan, Yongsheng Du, Kunlin Liu, Yurui Zhu
cs.AI

摘要

近期,DeepSeek-R1(671B)(DeepSeek-AI等,2025)在复杂任务中展现了卓越的推理能力,并公开了其方法论。这为激发小型大语言模型(LLMs)的推理能力提供了潜在的高质量思维链(CoT)数据。为了为不同LLMs生成高质量的CoT数据,我们探索了一种高效的方法,即生成具有LLM自适应问题难度级别的高质量CoT数据。首先,我们根据LLMs自身的推理能力对问题难度进行分级,并构建了一个LLM自适应问题数据库。其次,基于问题难度级别的分布从问题库中采样,随后利用DeepSeek-R1(671B)(DeepSeek-AI等,2025)生成相应的高质量CoT数据及正确答案。得益于构建的LLM自适应难度级别的CoT数据,我们显著降低了数据生成成本,并提升了模型监督微调(SFT)的效率。最后,我们在复杂数学竞赛和代码生成任务领域验证了所提方法的有效性和泛化能力。值得注意的是,仅使用2k条高质量数学CoT数据,我们的ZMath-32B在数学推理任务上便超越了DeepSeek-Distill-32B。同样,仅使用2k条高质量代码CoT数据,我们的ZCode-32B在代码推理任务中也超越了DeepSeek-Distill-32B。
English
Recently, DeepSeek-R1 (671B) (DeepSeek-AIet al., 2025) has demonstrated its excellent reasoning ability in complex tasks and has publiclyshared its methodology. This provides potentially high-quality chain-of-thought (CoT) data for stimulating the reasoning abilities of small-sized large language models (LLMs). To generate high-quality CoT data for different LLMs, we seek an efficient method for generating high-quality CoT data with LLM-Adaptive questiondifficulty levels. First, we grade the difficulty of the questions according to the reasoning ability of the LLMs themselves and construct a LLM-Adaptive question database. Second, we sample the problem database based on a distribution of difficulty levels of the questions and then use DeepSeek-R1 (671B) (DeepSeek-AI et al., 2025) to generate the corresponding high-quality CoT data with correct answers. Thanks to the construction of CoT data with LLM-Adaptive difficulty levels, we have significantly reduced the cost of data generation and enhanced the efficiency of model supervised fine-tuning (SFT). Finally, we have validated the effectiveness and generalizability of the proposed method in the fields of complex mathematical competitions and code generation tasks. Notably, with only 2k high-quality mathematical CoT data, our ZMath-32B surpasses DeepSeek-Distill-32B in math reasoning task. Similarly, with only 2k high-quality code CoT data, our ZCode-32B surpasses DeepSeek-Distill-32B in code reasoning tasks.

Summary

AI-Generated Summary

PDF92April 24, 2025