Leanabell-证明器：形式推理中的训练后扩展

摘要

近期，通过大语言模型（LLMs）在自动定理证明（ATP）领域的进展，凸显了利用Lean 4代码进行形式化推理的潜力。然而，ATP尚未如OpenAI的O1/O3和Deepseek的R1所示，因后训练规模扩展而迎来革命性变革。本研究深入探讨了ATP的整个后训练过程，旨在使其与自然语言推理模型的突破性进展保持一致。首先，我们采用混合数据集对现有ATP模型进行持续训练，该数据集包含大量陈述-证明对，以及旨在融入模拟人类推理和假设精炼的认知行为的额外数据。接着，我们探索了利用Lean 4编译器返回的结果奖励进行强化学习的方法。通过我们设计的持续训练和强化学习流程，我们成功提升了包括DeepSeek-Prover-v1.5和Goedel-Prover在内的现有形式化证明器的性能，在全证明生成领域达到了最先进水平。例如，在MiniF2F测试集上，我们实现了59.8%的通过率（pass@32）。这是一项持续进行的研究，我们将逐步更新我们的发现，并公开我们的数据及训练细节。

English

Recent advances in automated theorem proving (ATP) through LLMs have highlighted the potential of formal reasoning with Lean 4 codes. However, ATP has not yet be revolutionized by the recent posttraining scaling as demonstrated by Open AI O1/O3 and Deepseek R1. In this work, we investigate the entire posttraining of ATP, aiming to align it with breakthroughs in reasoning models in natural languages.To begin, we continual train current ATP models with a hybrid dataset, which consists of numerous statement-proof pairs, and additional data aimed at incorporating cognitive behaviors that emulate human reasoning and hypothesis refinement. Next, we explore reinforcement learning with the use of outcome reward returned by Lean 4 compiler. Through our designed continual training and reinforcement learning processes, we have successfully improved existing formal provers, including both DeepSeek-Prover-v1.5 and Goedel-Prover, achieving state-of-the-art performance in the field of whole-proof generation. For example, we achieve a 59.8% pass rate (pass@32) on MiniF2F. This is an on-going project and we will progressively update our findings, release our data and training details.

Leanabell-证明器：形式推理中的训练后扩展

Leanabell-Prover: Posttraining Scaling in Formal Reasoning

摘要

Summary

Support

Support