ChatPaper.aiChatPaper

Z1:基于代码的高效测试时扩展

Z1: Efficient Test-time Scaling with Code

April 1, 2025
作者: Zhaojian Yu, Yinghao Wu, Yilun Zhao, Arman Cohan, Xiao-Ping Zhang
cs.AI

摘要

大型语言模型(LLMs)通过测试时计算扩展能够实现更复杂的解题能力,但这通常伴随着更长的上下文和大量的推理标记成本。本文提出了一种高效的测试时扩展方法,该方法通过训练LLMs处理代码相关的推理轨迹,帮助其减少多余的思考标记,同时保持性能。首先,我们创建了Z1-Code-Reasoning-107K,这是一个精心策划的数据集,包含简单和复杂的编程问题及其对应的简短和详细解决方案轨迹。其次,我们引入了一种新颖的“思维窗口偏移”技术,通过移除上下文界定标签(如<think>. . . </think>)并限制推理标记数量,来减轻过度思考的开销。结合长、短轨迹数据训练并配备“思维窗口偏移”的模型Z1-7B,展现了根据问题复杂度调整推理深度的能力,并在不同推理任务中实现了高效的测试时扩展,其表现与R1-Distill-Qwen-7B相当,但平均思考标记数仅为后者的约30%。值得注意的是,仅通过代码轨迹微调的Z1-7B,在更广泛的推理任务上展现了泛化能力(在GPQA Diamond上达到47.5%)。我们对高效推理激发机制的分析,也为未来研究提供了宝贵的洞见。
English
Large Language Models (LLMs) can achieve enhanced complex problem-solving through test-time computing scaling, yet this often entails longer contexts and numerous reasoning token costs. In this paper, we propose an efficient test-time scaling method that trains LLMs on code-related reasoning trajectories, facilitating their reduction of excess thinking tokens while maintaining performance. First, we create Z1-Code-Reasoning-107K, a curated dataset of simple and complex coding problems paired with their short and long solution trajectories. Second, we present a novel Shifted Thinking Window to mitigate overthinking overhead by removing context-delimiting tags (e.g., <think>. . . </think>) and capping reasoning tokens. Trained with long and short trajectory data and equipped with Shifted Thinking Window, our model, Z1-7B, demonstrates the ability to adjust its reasoning level as the complexity of problems and exhibits efficient test-time scaling across different reasoning tasks that matches R1-Distill-Qwen-7B performance with about 30% of its average thinking tokens. Notably, fine-tuned with only code trajectories, Z1-7B demonstrates generalization to broader reasoning tasks (47.5% on GPQA Diamond). Our analysis of efficient reasoning elicitation also provides valuable insights for future research.

Summary

AI-Generated Summary

PDF263April 2, 2025