생각이 여기저기 흩어져 있습니다: o1과 유사한 LLMs의 미충분한 사고에 대해

초록

대규모 언어 모델(Large language models, LLMs)인 OpenAI의 o1과 같은 모델은 테스트 시간 컴퓨팅을 확장하고 인간과 유사한 심층적 사고를 나타내며 복잡한 추론 작업에서 놀라운 능력을 보여주었습니다. 그러나 우리는 o1과 유사한 LLMs에서 빈번히 발생하는 '언더띵킹(underthinking)' 현상을 확인했습니다. 이는 유망한 경로를 충분히 탐색하지 않고 올바른 해결책에 도달하기 위해 서로 다른 추론적 사고 사이를 자주 전환하는 것을 의미합니다. 이러한 행동은 부족한 추론의 깊이와 난이도가 높은 수학 문제에서 특히 성능이 저하되는 결과를 초래합니다. 이 문제를 체계적으로 분석하기 위해 우리는 세 가지 어려운 테스트 세트와 두 가지 대표적인 오픈소스 o1과 유사한 모델에서 실험을 실시하여, 빈번한 사고 전환이 잘못된 응답과 관련이 있다는 것을 밝혀내었습니다. 우리는 잘못된 답변에서 토큰 효율을 측정하여 언더띵킹을 양적으로 측정하는 새로운 메트릭을 소개합니다. 언더띵킹에 대응하기 위해, 우리는 사고 전환 페널티 TIP를 도입한 디코딩 전략을 제안하여, 사고 사이의 조기 전환을 억제하고 각 추론 경로를 더 깊이 탐구하도록 유도합니다. 실험 결과는 우리의 접근 방식이 모델 파인튜닝을 필요로 하지 않고 어려운 데이터셋 전반에서 정확도를 향상시킨다는 것을 입증합니다. 우리의 연구 결과는 o1과 유사한 LLMs의 추론 비효율성을 이해하는 데 기여하며, 그들의 문제 해결 능력을 향상시키기 위한 실용적인 해결책을 제시합니다.

English

Large language models (LLMs) such as OpenAI's o1 have demonstrated remarkable abilities in complex reasoning tasks by scaling test-time compute and exhibiting human-like deep thinking. However, we identify a phenomenon we term underthinking, where o1-like LLMs frequently switch between different reasoning thoughts without sufficiently exploring promising paths to reach a correct solution. This behavior leads to inadequate depth of reasoning and decreased performance, particularly on challenging mathematical problems. To systematically analyze this issue, we conduct experiments on three challenging test sets and two representative open-source o1-like models, revealing that frequent thought switching correlates with incorrect responses. We introduce a novel metric to quantify underthinking by measuring token efficiency in incorrect answers. To address underthinking, we propose a decoding strategy with thought switching penalty TIP that discourages premature transitions between thoughts, encouraging deeper exploration of each reasoning path. Experimental results demonstrate that our approach improves accuracy across challenging datasets without requiring model fine-tuning. Our findings contribute to understanding reasoning inefficiencies in o1-like LLMs and offer a practical solution to enhance their problem-solving capabilities.

생각이 여기저기 흩어져 있습니다: o1과 유사한 LLMs의 미충분한 사고에 대해

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

초록

Summary

Support