ChatPaper.aiChatPaper

思维无处不在:关于o1-Like LLMs 的欠思考

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

January 30, 2025
作者: Yue Wang, Qiuzhi Liu, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Linfeng Song, Dian Yu, Juntao Li, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu
cs.AI

摘要

大型语言模型(LLMs)如OpenAI的o1已经展示出在复杂推理任务中的显著能力,通过扩展测试时计算量并展现类似人类的深度思考。然而,我们发现了一种我们称之为“欠思考”的现象,在这种现象中,类似o1的LLMs经常在不充分探索有前景的路径以达到正确解决方案的情况下在不同的推理思路之间频繁切换。这种行为导致推理深度不足和性能下降,特别是在具有挑战性的数学问题上。为了系统地分析这个问题,我们在三个具有挑战性的测试集和两个代表性的开源o1-like模型上进行实验,揭示了频繁的思维切换与错误响应相关。我们引入了一个新颖的度量标准来量化欠思考,通过测量错误答案中的标记效率。为了解决欠思考问题,我们提出了一种解码策略,即思维切换惩罚(TIP),它抑制了过早地在不同思路之间切换,鼓励更深入地探索每个推理路径。实验结果表明,我们的方法提高了在具有挑战性的数据集上的准确性,而无需对模型进行微调。我们的发现有助于理解类似o1的LLMs中的推理低效问题,并提供了一个实用的解决方案来增强它们的问题解决能力。
English
Large language models (LLMs) such as OpenAI's o1 have demonstrated remarkable abilities in complex reasoning tasks by scaling test-time compute and exhibiting human-like deep thinking. However, we identify a phenomenon we term underthinking, where o1-like LLMs frequently switch between different reasoning thoughts without sufficiently exploring promising paths to reach a correct solution. This behavior leads to inadequate depth of reasoning and decreased performance, particularly on challenging mathematical problems. To systematically analyze this issue, we conduct experiments on three challenging test sets and two representative open-source o1-like models, revealing that frequent thought switching correlates with incorrect responses. We introduce a novel metric to quantify underthinking by measuring token efficiency in incorrect answers. To address underthinking, we propose a decoding strategy with thought switching penalty TIP that discourages premature transitions between thoughts, encouraging deeper exploration of each reasoning path. Experimental results demonstrate that our approach improves accuracy across challenging datasets without requiring model fine-tuning. Our findings contribute to understanding reasoning inefficiencies in o1-like LLMs and offer a practical solution to enhance their problem-solving capabilities.

Summary

AI-Generated Summary

PDF5911January 31, 2025