煉金術:透過符號變異來增強定理證明能力
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation
October 21, 2024
作者: Shaonan Wu, Shuai Lu, Yeyun Gong, Nan Duan, Ping Wei
cs.AI
摘要
撰寫正式證明即使對經驗豐富的專家來說也是具有挑戰性的。最近神經定理證明(NTP)的進展顯示了加速這一過程的潛力。然而,網絡上可用的正式語料庫相對於一般文本來說有限,這為NTP帶來了重大的數據稀缺挑戰。為了解決這個問題,本研究提出了Alchemy,一個通用的數據合成框架,通過符號變異構建正式定理。具體來說,對於Mathlib中的每個候選定理,我們識別所有可調用的定理,可以對其進行重寫或應用。隨後,我們通過用等價形式或前提替換陳述中的相應術語來變異候選定理。因此,我們的方法將Mathlib中的定理數量增加了一個數量級,從110k增加到6M。此外,我們對這個擴充語料庫進行了持續的預訓練和監督微調以供大型語言模型使用。實驗結果顯示了我們方法的有效性,在Leandojo基準測試中實現了5%的絕對性能提升。此外,我們的合成數據在超出分佈的miniF2F基準測試中實現了2.5%的絕對性能增益。為了提供更多見解,我們對合成數據組成和訓練範式進行了全面分析,為發展強大的定理證明器提供了有價值的指導。
English
Formal proofs are challenging to write even for experienced experts. Recent
progress in Neural Theorem Proving (NTP) shows promise in expediting this
process. However, the formal corpora available on the Internet are limited
compared to the general text, posing a significant data scarcity challenge for
NTP. To address this issue, this work proposes Alchemy, a general framework for
data synthesis that constructs formal theorems through symbolic mutation.
Specifically, for each candidate theorem in Mathlib, we identify all invocable
theorems that can be used to rewrite or apply to it. Subsequently, we mutate
the candidate theorem by replacing the corresponding term in the statement with
its equivalent form or antecedent. As a result, our method increases the number
of theorems in Mathlib by an order of magnitude, from 110k to 6M. Furthermore,
we perform continual pretraining and supervised finetuning on this augmented
corpus for large language models. Experimental results demonstrate the
effectiveness of our approach, achieving a 5% absolute performance improvement
on Leandojo benchmark. Additionally, our synthetic data achieve a 2.5% absolute
performance gain on the out-of-distribution miniF2F benchmark. To provide
further insights, we conduct a comprehensive analysis of synthetic data
composition and the training paradigm, offering valuable guidance for
developing a strong theorem prover.Summary
AI-Generated Summary