Leanabell-证明器:形式推理中的训练后扩展
Leanabell-Prover: Posttraining Scaling in Formal Reasoning
April 8, 2025
作者: Jingyuan Zhang, Qi Wang, Xingguang Ji, Yahui Liu, Yang Yue, Fuzheng Zhang, Di Zhang, Guorui Zhou, Kun Gai
cs.AI
摘要
近期,通过大语言模型(LLMs)在自动定理证明(ATP)领域的进展,凸显了利用Lean 4代码进行形式化推理的潜力。然而,ATP尚未如OpenAI的O1/O3和Deepseek的R1所示,因后训练规模扩展而迎来革命性变革。本研究深入探讨了ATP的整个后训练过程,旨在使其与自然语言推理模型的突破性进展保持一致。首先,我们采用混合数据集对现有ATP模型进行持续训练,该数据集包含大量陈述-证明对,以及旨在融入模拟人类推理和假设精炼的认知行为的额外数据。接着,我们探索了利用Lean 4编译器返回的结果奖励进行强化学习的方法。通过我们设计的持续训练和强化学习流程,我们成功提升了包括DeepSeek-Prover-v1.5和Goedel-Prover在内的现有形式化证明器的性能,在全证明生成领域达到了最先进水平。例如,在MiniF2F测试集上,我们实现了59.8%的通过率(pass@32)。这是一项持续进行的研究,我们将逐步更新我们的发现,并公开我们的数据及训练细节。
English
Recent advances in automated theorem proving (ATP) through LLMs have
highlighted the potential of formal reasoning with Lean 4 codes. However, ATP
has not yet be revolutionized by the recent posttraining scaling as
demonstrated by Open AI O1/O3 and Deepseek R1. In this work, we investigate the
entire posttraining of ATP, aiming to align it with breakthroughs in reasoning
models in natural languages.To begin, we continual train current ATP models
with a hybrid dataset, which consists of numerous statement-proof pairs, and
additional data aimed at incorporating cognitive behaviors that emulate human
reasoning and hypothesis refinement. Next, we explore reinforcement learning
with the use of outcome reward returned by Lean 4 compiler. Through our
designed continual training and reinforcement learning processes, we have
successfully improved existing formal provers, including both
DeepSeek-Prover-v1.5 and Goedel-Prover, achieving state-of-the-art performance
in the field of whole-proof generation. For example, we achieve a 59.8% pass
rate (pass@32) on MiniF2F. This is an on-going project and we will
progressively update our findings, release our data and training details.Summary
AI-Generated Summary