O1-Pruner: O1와 유사한 추론 가지치기를 위한 길이 조절 미세 조정

초록

최근에는, 오랫동안 사고해온 LLMs(예: OpenAI의 O1)가 사람들이 복잡한 문제를 곰곰히 생각하는 방식과 유사한 확장된 추론 과정을 채택하고 있다. 이러한 추론 패러다임은 모델의 문제 해결 능력을 크게 향상시키고 유망한 결과를 이룩하였다. 그러나, 오랫동안 사고하는 과정은 추론 시간을 상당히 증가시킨다. 긴 사고 과정의 LLMs의 추론 오버헤드를 줄이면서 정확성을 보장하는 것은 시급한 과제이다. 본 논문에서는, 오랫동안 사고하는 모델이 문제의 난이도와 추론 중복에 기반한 토큰 예산을 효과적으로 할당하는 데 어려움을 겪는 것을 실험적으로 보여준다. 이를 해결하기 위해, 우리는 Length-Harmonizing Fine-Tuning(O1-Pruner)을 제안한다. 이 방법은 정확도를 유지하면서 추론 오버헤드를 최소화하는 것을 목표로 한다. 이 효과적인 세밀 조정 방법은 먼저 사전 샘플링을 통해 LLM의 기준 성능을 추정하고, 그런 다음 RL 스타일의 세밀 조정을 사용하여 모델이 정확도 제약 하에 더 짧은 추론 과정을 생성하도록 장려한다. 이를 통해 모델은 더 낮은 중복성으로 효율적인 추론을 달성하면서도 정확도를 유지할 수 있다. 다양한 수학적 추론 벤치마크 실험에서, O1-Pruner는 추론 오버헤드를 크게 줄이는데 그치지 않고 더 높은 정확도를 달성하여, 이러한 과제에 대한 새롭고 유망한 해결책을 제공한다. 우리의 코드는 곧 https://github.com/StarDewXXX/O1-Pruner 에서 공개될 예정이다.

English

Recently, long-thought reasoning LLMs, such as OpenAI's O1, adopt extended reasoning processes similar to how humans ponder over complex problems. This reasoning paradigm significantly enhances the model's problem-solving abilities and has achieved promising results. However, long-thought reasoning process leads to a substantial increase in inference time. A pressing challenge is reducing the inference overhead of long-thought LLMs while ensuring accuracy. In this paper, we experimentally demonstrate that long-thought reasoning models struggle to effectively allocate token budgets based on problem difficulty and reasoning redundancies. To address this, we propose Length-Harmonizing Fine-Tuning (O1-Pruner), aiming at minimizing reasoning overhead while maintaining accuracy. This effective fine-tuning method first estimates the LLM's baseline performance through pre-sampling and then uses RL-style fine-tuning to encourage the model to generate shorter reasoning processes under accuracy constraints. This allows the model to achieve efficient reasoning with lower redundancy while maintaining accuracy. Experiments on various mathematical reasoning benchmarks show that O1-Pruner not only significantly reduces inference overhead but also achieves higher accuracy, providing a novel and promising solution to this challenge. Our code is coming soon at https://github.com/StarDewXXX/O1-Pruner

O1-Pruner: O1와 유사한 추론 가지치기를 위한 길이 조절 미세 조정

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

초록

Support