O1-Pruner：长度协调微调，用于类O1推理剪枝

摘要

最近，长推理的LLMs，比如OpenAI的O1，采用类似于人类思考复杂问题的扩展推理过程。这种推理范式显著增强了模型的问题解决能力，并取得了令人期待的结果。然而，长推理过程导致推理时间大幅增加。一个紧迫的挑战是降低长推理LLMs的推理开销，同时确保准确性。在本文中，我们通过实验证明，长推理模型在根据问题难度和推理冗余性有效分配记号预算方面存在困难。为了解决这个问题，我们提出了长度协调微调（O1-Pruner），旨在最小化推理开销同时保持准确性。这种有效的微调方法首先通过预采样估计LLM的基准性能，然后使用RL风格的微调来鼓励模型在准确性约束下生成更短的推理过程。这使得模型能够在减少冗余的同时保持准确性实现高效推理。对各种数学推理基准的实验表明，O1-Pruner不仅显著降低了推理开销，还实现了更高的准确性，为这一挑战提供了一种新颖且有前景的解决方案。我们的代码即将发布在https://github.com/StarDewXXX/O1-Pruner。

English

Recently, long-thought reasoning LLMs, such as OpenAI's O1, adopt extended reasoning processes similar to how humans ponder over complex problems. This reasoning paradigm significantly enhances the model's problem-solving abilities and has achieved promising results. However, long-thought reasoning process leads to a substantial increase in inference time. A pressing challenge is reducing the inference overhead of long-thought LLMs while ensuring accuracy. In this paper, we experimentally demonstrate that long-thought reasoning models struggle to effectively allocate token budgets based on problem difficulty and reasoning redundancies. To address this, we propose Length-Harmonizing Fine-Tuning (O1-Pruner), aiming at minimizing reasoning overhead while maintaining accuracy. This effective fine-tuning method first estimates the LLM's baseline performance through pre-sampling and then uses RL-style fine-tuning to encourage the model to generate shorter reasoning processes under accuracy constraints. This allows the model to achieve efficient reasoning with lower redundancy while maintaining accuracy. Experiments on various mathematical reasoning benchmarks show that O1-Pruner not only significantly reduces inference overhead but also achieves higher accuracy, providing a novel and promising solution to this challenge. Our code is coming soon at https://github.com/StarDewXXX/O1-Pruner

O1-Pruner：长度协调微调，用于类O1推理剪枝

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

摘要

Summary

Support

Support