ChatPaper.aiChatPaper

退一步,为了推动语言模型推理能力的自我回溯

Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models

February 6, 2025
作者: Xiao-Wen Yang, Xuan-Yi Zhu, Wen-Da Wei, Ding-Chu Zhang, Jie-Jing Shao, Zhi Zhou, Lan-Zhe Guo, Yu-Feng Li
cs.AI

摘要

将缓慢思考机制整合到大型语言模型(LLMs)中,为实现第2级AGI推理者提供了一种有前途的途径,正如OpenAI的o1系统所示。然而,仍然存在一些重要挑战,包括低效的过度思考和对辅助奖励模型的过度依赖。我们指出,这些限制源于LLMs无法内化搜索过程,这是有效推理的关键组成部分。解决这一问题的关键步骤是使LLMs能够自主确定何时何地进行回溯,这是传统搜索算法中的基本操作。为此,我们提出了一种自回溯机制,使LLMs能够在训练和推理过程中进行回溯。这种机制不仅增强了推理能力,还通过自我改进将缓慢思考过程转化为快速思考,提高了效率。实证评估表明,我们的提议显著增强了LLMs的推理能力,与最佳路径监督微调方法相比,性能提高了超过40%。我们相信这项研究为开发更先进和更强大的推理者开辟了一条新颖且有前景的途径。
English
The integration of slow-thinking mechanisms into large language models (LLMs) offers a promising way toward achieving Level 2 AGI Reasoners, as exemplified by systems like OpenAI's o1. However, several significant challenges remain, including inefficient overthinking and an overreliance on auxiliary reward models. We point out that these limitations stem from LLMs' inability to internalize the search process, a key component of effective reasoning. A critical step toward addressing this issue is enabling LLMs to autonomously determine when and where to backtrack, a fundamental operation in traditional search algorithms. To this end, we propose a self-backtracking mechanism that equips LLMs with the ability to backtrack during both training and inference. This mechanism not only enhances reasoning ability but also efficiency by transforming slow-thinking processes into fast-thinking through self-improvement. Empirical evaluations demonstrate that our proposal significantly enhances the reasoning capabilities of LLMs, achieving a performance gain of over 40 percent compared to the optimal-path supervised fine-tuning method. We believe this study introduces a novel and promising pathway for developing more advanced and robust Reasoners.

Summary

AI-Generated Summary

PDF242February 10, 2025