한 발 물러나서 더 나아가기: 언어 모델의 추론 강화를 위한 자체 역추적

초록

느린 사고 메커니즘을 대형 언어 모델 (LLM)에 통합하는 것은 OpenAI의 o1과 같은 시스템에서 보여주는 것처럼 2단계 AGI Reasoners를 달성하기 위한 유망한 방법을 제공합니다. 그러나 비효율적인 과다 사고와 부수적 보상 모델에 대한 과도한 의존 등 여러 중요한 도전 과제가 남아 있습니다. 우리는 이러한 제한이 효과적인 추론의 핵심 구성 요소인 탐색 과정을 내재화하지 못하는 LLM의 한계에서 비롯된다는 점을 지적합니다. 이 문제에 대한 중요한 단계는 LLM이 전통적인 탐색 알고리즘에서의 기본 작업인 백트래킹을 자율적으로 결정할 수 있도록 하는 것입니다. 이를 위해 우리는 LLM이 훈련 및 추론 중에 백트래킹을 수행할 수 있는 자체 백트래킹 메커니즘을 제안합니다. 이 메커니즘은 느린 사고 과정을 자체 개선을 통해 빠른 사고로 변환하여 추론 능력과 효율성을 향상시킵니다. 경험적 평가 결과, 우리의 제안이 최적 경로 지도 미세 조정 방법과 비교하여 LLM의 추론 능력을 크게 향상시켰으며, 성능 향상률이 40% 이상입니다. 이 연구가 더 발전된 강력한 Reasoners를 개발하기 위한 혁신적이고 유망한 경로를 제시한다고 믿습니다.

English

The integration of slow-thinking mechanisms into large language models (LLMs) offers a promising way toward achieving Level 2 AGI Reasoners, as exemplified by systems like OpenAI's o1. However, several significant challenges remain, including inefficient overthinking and an overreliance on auxiliary reward models. We point out that these limitations stem from LLMs' inability to internalize the search process, a key component of effective reasoning. A critical step toward addressing this issue is enabling LLMs to autonomously determine when and where to backtrack, a fundamental operation in traditional search algorithms. To this end, we propose a self-backtracking mechanism that equips LLMs with the ability to backtrack during both training and inference. This mechanism not only enhances reasoning ability but also efficiency by transforming slow-thinking processes into fast-thinking through self-improvement. Empirical evaluations demonstrate that our proposal significantly enhances the reasoning capabilities of LLMs, achieving a performance gain of over 40 percent compared to the optimal-path supervised fine-tuning method. We believe this study introduces a novel and promising pathway for developing more advanced and robust Reasoners.

한 발 물러나서 더 나아가기: 언어 모델의 추론 강화를 위한 자체 역추적

Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models

초록

Support