ChatPaper.aiChatPaper

哥德尔证明者:开源自动定理证明的前沿模型

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving

February 11, 2025
作者: Yong Lin, Shange Tang, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, Chi Jin
cs.AI

摘要

我们介绍 Goedel-Prover,这是一个开源的大型语言模型(LLM),在数学问题的自动形式化证明生成中实现了最先进的性能。这一领域的关键挑战在于数学陈述和证明的形式化程度不足,我们通过以下方式来解决这一问题。我们训练陈述形式化器,将 Numina 中的自然语言数学问题翻译成形式化语言(Lean 4),创建了一个包含 1.64 百万个形式化陈述的数据集。LLM 被用来检查这些形式化陈述是否准确地保留了原始自然语言问题的内容。然后,我们通过训练一系列证明器来迭代地构建一个大型形式证明数据集。每个证明器成功地证明了许多之前的证明器无法证明的陈述,这些新的证明被添加到下一个证明器的训练集中。最终的证明器在整个证明生成过程中胜过了所有现有的开源模型。在 miniF2F 基准测试中,它实现了 57.6% 的成功率(Pass@32),比之前最好的开源模型高出了 7.6%。在 PutnamBench 上,Goedel-Prover 成功解决了 7 个问题(Pass@512),在排行榜上名列第一。此外,它为 Lean Workbook 问题生成了 29.7K 个形式证明,几乎是之前作品产生的 15.7K 的两倍。
English
We introduce Goedel-Prover, an open-source large language model (LLM) that achieves the state-of-the-art (SOTA) performance in automated formal proof generation for mathematical problems. The key challenge in this field is the scarcity of formalized math statements and proofs, which we tackle in the following ways. We train statement formalizers to translate the natural language math problems from Numina into formal language (Lean 4), creating a dataset of 1.64 million formal statements. LLMs are used to check that the formal statements accurately preserve the content of the original natural language problems. We then iteratively build a large dataset of formal proofs by training a series of provers. Each prover succeeds in proving many statements that the previous ones could not, and these new proofs are added to the training set for the next prover. The final prover outperforms all existing open-source models in whole-proof generation. On the miniF2F benchmark, it achieves a 57.6% success rate (Pass@32), exceeding the previous best open-source model by 7.6%. On PutnamBench, Goedel-Prover successfully solves 7 problems (Pass@512), ranking first on the leaderboard. Furthermore, it generates 29.7K formal proofs for Lean Workbook problems, nearly doubling the 15.7K produced by earlier works.

Summary

AI-Generated Summary

PDF82February 12, 2025