ChatPaper.aiChatPaper

过度思考的危险:探讨在主体任务中的推理-行动困境

The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

February 12, 2025
作者: Alejandro Cuadron, Dacheng Li, Wenjie Ma, Xingyao Wang, Yichuan Wang, Siyuan Zhuang, Shu Liu, Luis Gaspar Schroeder, Tian Xia, Huanzhi Mao, Nicholas Thumiger, Aditya Desai, Ion Stoica, Ana Klimovic, Graham Neubig, Joseph E. Gonzalez
cs.AI

摘要

大推理模型(LRMs)代表了人工智能问题解决能力的突破,但它们在交互环境中的有效性可能受限。本文介绍并分析了LRMs中的过度思考现象,即模型更倾向于延长内部推理链而非与环境互动。通过在使用SWE Bench Verified的软件工程任务上进行实验,我们观察到三种反复出现的模式:分析瘫痪、流氓行为和过早脱离。我们提出了一个框架来研究这些行为,该框架与人类专家评估相关,并分析了4018条轨迹。我们观察到,较高的过度思考得分与性能下降相关,推理模型相较于非推理模型更倾向于过度思考。我们的分析揭示了在主动环境中减轻过度思考的简单努力,例如选择具有较低过度思考得分的解决方案,可以将模型性能提高近30%,同时将计算成本降低43%。这些结果表明减轻过度思考具有重要的实际意义。我们建议通过利用本地函数调用能力和选择性强化学习来减轻过度思考倾向。我们还开源了我们的评估框架和数据集,以促进在这个方向上的研究,网址为https://github.com/AlexCuadron/Overthinking。
English
Large Reasoning Models (LRMs) represent a breakthrough in AI problem-solving capabilities, but their effectiveness in interactive environments can be limited. This paper introduces and analyzes overthinking in LRMs. A phenomenon where models favor extended internal reasoning chains over environmental interaction. Through experiments on software engineering tasks using SWE Bench Verified, we observe three recurring patterns: Analysis Paralysis, Rogue Actions, and Premature Disengagement. We propose a framework to study these behaviors, which correlates with human expert assessments, and analyze 4018 trajectories. We observe that higher overthinking scores correlate with decreased performance, with reasoning models exhibiting stronger tendencies toward overthinking compared to non-reasoning models. Our analysis reveals that simple efforts to mitigate overthinking in agentic environments, such as selecting the solution with the lower overthinking score, can improve model performance by almost 30% while reducing computational costs by 43%. These results suggest that mitigating overthinking has strong practical implications. We suggest that by leveraging native function-calling capabilities and selective reinforcement learning overthinking tendencies could be mitigated. We also open-source our evaluation framework and dataset to facilitate research in this direction at https://github.com/AlexCuadron/Overthinking.

Summary

AI-Generated Summary

PDF552February 17, 2025