推理模型无需思考也能高效运作

摘要

近期的大型语言模型（LLMs）在推理能力上取得了显著进步，这主要归功于在生成过程中引入了显式且冗长的思维过程。本文质疑这种显式思维是否必要。通过使用当前最先进的DeepSeek-R1-Distill-Qwen模型，我们发现，通过简单的提示绕过思维过程（称为“无思维”）竟然出奇地有效。在控制令牌数量的情况下，“无思维”在涵盖数学问题求解、形式定理证明及编程等七项具有挑战性的推理数据集上均优于“有思维”，尤其是在低预算设置下，例如在ACM 23数据集上，使用700个令牌时，得分分别为51.3对28.9。值得注意的是，随着k值的增加，“无思维”在pass@k指标上的表现愈发具有竞争力。基于这一观察，我们展示了一种并行扩展方法，该方法利用“无思维”独立生成N个输出并进行聚合，效果显著。对于聚合，我们采用任务特定的验证器（如果可用），或应用简单的N选优策略，如基于置信度的选择。我们的方法在使用“有思维”时，与一系列基线相比，在相似延迟下表现更优，且与显著延长延迟（最多达9倍）的“有思维”方法相当。综上所述，我们的研究促使人们重新审视冗长思维过程的必要性，同时为在低预算设置或低延迟条件下，通过并行扩展实现强大推理性能，树立了一个具有竞争力的参考标准。

English

Recent LLMs have significantly improved reasoning capabilities, primarily by including an explicit, lengthy Thinking process as part of generation. In this paper, we question whether this explicit thinking is necessary. Using the state-of-the-art DeepSeek-R1-Distill-Qwen, we find that bypassing the thinking process via simple prompting, denoted as NoThinking, can be surprisingly effective. When controlling for the number of tokens, NoThinking outperforms Thinking across a diverse set of seven challenging reasoning datasets--including mathematical problem solving, formal theorem proving, and coding--especially in low-budget settings, e.g., 51.3 vs. 28.9 on ACM 23 with 700 tokens. Notably, the performance of NoThinking becomes more competitive with pass@k as k increases. Building on this observation, we demonstrate that a parallel scaling approach that uses NoThinking to generate N outputs independently and aggregates them is highly effective. For aggregation, we use task-specific verifiers when available, or we apply simple best-of-N strategies such as confidence-based selection. Our method outperforms a range of baselines with similar latency using Thinking, and is comparable to Thinking with significantly longer latency (up to 9x). Together, our research encourages a reconsideration of the necessity of lengthy thinking processes, while also establishing a competitive reference for achieving strong reasoning performance in low-budget settings or at low latency using parallel scaling.

推理模型无需思考也能高效运作

Reasoning Models Can Be Effective Without Thinking

摘要

Summary

Support

Support