缺失前提加剧过度思考：推理模型是否正在丧失批判性思维能力？

摘要

我们发现，无论是通过强化学习还是监督学习训练的推理型大语言模型（LLMs），在面对前提缺失（MiP）的不适定问题时，其响应长度会显著增加，最终导致冗余且无效的思考。这一新引入的场景在很大程度上加剧了普遍存在的过度思考问题，我们将其命名为“MiP-过度思考”。此类失败现象违背了“测试时扩展定律”，但在我们构建的多个包含MiP的数据集上广泛存在，揭示了廉价过度思考的危害及批判性思维的缺失。令人惊讶的是，未专门针对推理进行训练的LLMs在MiP场景下表现更佳，生成的响应更短，能迅速识别不适定查询。这表明当前推理型LLMs的训练方案存在关键缺陷，未能充分鼓励高效思考，导致思维模式的滥用。为了深入探究这些失败背后的原因，我们对不同类型LLMs的推理长度、过度思考模式及关键思维位置进行了细致分析。此外，我们的扩展消融研究表明，过度思考通过推理模型响应的蒸馏过程具有传染性。这些结果深化了我们对过度思考的理解，并为缓解该问题提供了新的见解。

English

We find that the response length of reasoning LLMs, whether trained by reinforcement learning or supervised learning, drastically increases for ill-posed questions with missing premises (MiP), ending up with redundant and ineffective thinking. This newly introduced scenario exacerbates the general overthinking issue to a large extent, which we name as the MiP-Overthinking. Such failures are against the ``test-time scaling law'' but have been widely observed on multiple datasets we curated with MiP, indicating the harm of cheap overthinking and a lack of critical thinking. Surprisingly, LLMs not specifically trained for reasoning exhibit much better performance on the MiP scenario, producing much shorter responses that quickly identify ill-posed queries. This implies a critical flaw of the current training recipe for reasoning LLMs, which does not encourage efficient thinking adequately, leading to the abuse of thinking patterns. To further investigate the reasons behind such failures, we conduct fine-grained analyses of the reasoning length, overthinking patterns, and location of critical thinking on different types of LLMs. Moreover, our extended ablation study reveals that the overthinking is contagious through the distillation of reasoning models' responses. These results improve the understanding of overthinking and shed novel insights into mitigating the problem.

缺失前提加剧过度思考：推理模型是否正在丧失批判性思维能力？

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

摘要

Summary

Support

Support