Z1:基于代码的高效测试时扩展
Z1: Efficient Test-time Scaling with Code
April 1, 2025
作者: Zhaojian Yu, Yinghao Wu, Yilun Zhao, Arman Cohan, Xiao-Ping Zhang
cs.AI
摘要
大型语言模型(LLMs)通过测试时计算扩展能够实现更复杂的解题能力,但这通常伴随着更长的上下文和大量的推理标记成本。本文提出了一种高效的测试时扩展方法,该方法通过训练LLMs处理代码相关的推理轨迹,帮助其减少多余的思考标记,同时保持性能。首先,我们创建了Z1-Code-Reasoning-107K,这是一个精心策划的数据集,包含简单和复杂的编程问题及其对应的简短和详细解决方案轨迹。其次,我们引入了一种新颖的“思维窗口偏移”技术,通过移除上下文界定标签(如<think>. . . </think>)并限制推理标记数量,来减轻过度思考的开销。结合长、短轨迹数据训练并配备“思维窗口偏移”的模型Z1-7B,展现了根据问题复杂度调整推理深度的能力,并在不同推理任务中实现了高效的测试时扩展,其表现与R1-Distill-Qwen-7B相当,但平均思考标记数仅为后者的约30%。值得注意的是,仅通过代码轨迹微调的Z1-7B,在更广泛的推理任务上展现了泛化能力(在GPQA Diamond上达到47.5%)。我们对高效推理激发机制的分析,也为未来研究提供了宝贵的洞见。
English
Large Language Models (LLMs) can achieve enhanced complex problem-solving
through test-time computing scaling, yet this often entails longer contexts and
numerous reasoning token costs. In this paper, we propose an efficient
test-time scaling method that trains LLMs on code-related reasoning
trajectories, facilitating their reduction of excess thinking tokens while
maintaining performance. First, we create Z1-Code-Reasoning-107K, a curated
dataset of simple and complex coding problems paired with their short and long
solution trajectories. Second, we present a novel Shifted Thinking Window to
mitigate overthinking overhead by removing context-delimiting tags (e.g.,
<think>. . . </think>) and capping reasoning tokens. Trained with long and
short trajectory data and equipped with Shifted Thinking Window, our model,
Z1-7B, demonstrates the ability to adjust its reasoning level as the complexity
of problems and exhibits efficient test-time scaling across different reasoning
tasks that matches R1-Distill-Qwen-7B performance with about 30% of its average
thinking tokens. Notably, fine-tuned with only code trajectories, Z1-7B
demonstrates generalization to broader reasoning tasks (47.5% on GPQA Diamond).
Our analysis of efficient reasoning elicitation also provides valuable insights
for future research.Summary
AI-Generated Summary