O1-Pruner:长度协调微调,用于类O1推理剪枝
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
January 22, 2025
作者: Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, Dacheng Tao
cs.AI
摘要
最近,长推理的LLMs,比如OpenAI的O1,采用类似于人类思考复杂问题的扩展推理过程。这种推理范式显著增强了模型的问题解决能力,并取得了令人期待的结果。然而,长推理过程导致推理时间大幅增加。一个紧迫的挑战是降低长推理LLMs的推理开销,同时确保准确性。在本文中,我们通过实验证明,长推理模型在根据问题难度和推理冗余性有效分配记号预算方面存在困难。为了解决这个问题,我们提出了长度协调微调(O1-Pruner),旨在最小化推理开销同时保持准确性。这种有效的微调方法首先通过预采样估计LLM的基准性能,然后使用RL风格的微调来鼓励模型在准确性约束下生成更短的推理过程。这使得模型能够在减少冗余的同时保持准确性实现高效推理。对各种数学推理基准的实验表明,O1-Pruner不仅显著降低了推理开销,还实现了更高的准确性,为这一挑战提供了一种新颖且有前景的解决方案。我们的代码即将发布在https://github.com/StarDewXXX/O1-Pruner。
English
Recently, long-thought reasoning LLMs, such as OpenAI's O1, adopt extended
reasoning processes similar to how humans ponder over complex problems. This
reasoning paradigm significantly enhances the model's problem-solving abilities
and has achieved promising results. However, long-thought reasoning process
leads to a substantial increase in inference time. A pressing challenge is
reducing the inference overhead of long-thought LLMs while ensuring accuracy.
In this paper, we experimentally demonstrate that long-thought reasoning models
struggle to effectively allocate token budgets based on problem difficulty and
reasoning redundancies. To address this, we propose Length-Harmonizing
Fine-Tuning (O1-Pruner), aiming at minimizing reasoning overhead while
maintaining accuracy. This effective fine-tuning method first estimates the
LLM's baseline performance through pre-sampling and then uses RL-style
fine-tuning to encourage the model to generate shorter reasoning processes
under accuracy constraints. This allows the model to achieve efficient
reasoning with lower redundancy while maintaining accuracy. Experiments on
various mathematical reasoning benchmarks show that O1-Pruner not only
significantly reduces inference overhead but also achieves higher accuracy,
providing a novel and promising solution to this challenge. Our code is coming
soon at https://github.com/StarDewXXX/O1-PrunerSummary
AI-Generated Summary