通過從頭開始的可擴展問題合成，釋放LLM的推理能力

摘要

高質量數據的可用性是提升大型語言模型推理能力的最重要因素之一。現有研究已證明從種子問題或知識庫創建更多指導數據的有效性。最近的研究表明，持續從強大模型（例如GPT-4）擴展數據合成可以進一步引出推理性能。儘管有潛力，但開源社區仍缺乏大規模高質量數據和可負擔成本的可擴展數據合成方法。為解決這一問題，我們引入了ScaleQuest，一種可擴展且新穎的數據合成方法，利用“小型”（例如7B）開源模型從頭生成問題，無需複雜的擴增約束種子數據。通過高效的ScaleQuest，我們自動構建了一個包含100萬個問題-解決方案對的數學推理數據集，比現有的開源數據集更有效。它可以普遍提高主流開源模型的性能（即Mistral、Llama3、DeepSeekMath和Qwen2-Math），在MATH上實現29.2%至46.4%的增益。值得注意的是，僅通過使用我們的數據集對Qwen2-Math-7B-Base模型進行微調，甚至可以超越Qwen2-Math-7B-Instruct，這是一個在閉源數據上強大且良好對齊的模型，以及GPT-4-Turbo和Claude-3.5 Sonnet等專有模型。

English

The availability of high-quality data is one of the most important factors in improving the reasoning capability of LLMs. Existing works have demonstrated the effectiveness of creating more instruction data from seed questions or knowledge bases. Recent research indicates that continually scaling up data synthesis from strong models (e.g., GPT-4) can further elicit reasoning performance. Though promising, the open-sourced community still lacks high-quality data at scale and scalable data synthesis methods with affordable costs. To address this, we introduce ScaleQuest, a scalable and novel data synthesis method that utilizes "small-size" (e.g., 7B) open-source models to generate questions from scratch without the need for seed data with complex augmentation constraints. With the efficient ScaleQuest, we automatically constructed a mathematical reasoning dataset consisting of 1 million problem-solution pairs, which are more effective than existing open-sourced datasets. It can universally increase the performance of mainstream open-source models (i.e., Mistral, Llama3, DeepSeekMath, and Qwen2-Math) by achieving 29.2% to 46.4% gains on MATH. Notably, simply fine-tuning the Qwen2-Math-7B-Base model with our dataset can even surpass Qwen2-Math-7B-Instruct, a strong and well-aligned model on closed-source data, and proprietary models such as GPT-4-Turbo and Claude-3.5 Sonnet.

通過從頭開始的可擴展問題合成，釋放LLM的推理能力

Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

摘要

Summary

Support

Support