构建一个以证明为导向的程序员，比GPT-4o在数据稀缺情况下提高了64%

摘要

现有的语言模型在面向证明的编程方面存在数据稀缺问题，主要表现在两个方面：(1) 缺乏针对面向证明编程语言（如F*）的足够语料库，以及 (2) 缺乏大规模、项目级的面向证明实现，无法教会模型在执行面向证明编程时复杂的推理过程。我们提出了一种基于合成数据增强的项目级面向证明编程方法，旨在生成和修复证明。我们的方法通过合成基本的面向证明编程问题来解决数据稀缺问题，以提高对该语言的熟练程度；结合多样化的编码数据来引发推理能力，并在现有存储库中创建新的证明和修复数据。这种方法使语言模型能够合成和修复函数级和存储库级代码的证明。我们展示，我们经过微调的拥有140亿参数的模型PoPilot，在项目级面向证明编程中的性能超过了GPT-4o，并相对提高了64%的性能，同时通过修复GPT-4o的输出，使其性能比GPT-4o自我修复提高了54%。

English

Existing LMs struggle with proof-oriented programming due to data scarcity, which manifest in two key ways: (1) a lack of sufficient corpora for proof-oriented programming languages such as F*, and (2) the absence of large-scale, project-level proof-oriented implementations that can teach the model the intricate reasoning process when performing proof-oriented programming. We present the first on synthetic data augmentation for project level proof oriented programming for both generation and repair. Our method addresses data scarcity by synthesizing basic proof-oriented programming problems for proficiency in that language; incorporating diverse coding data for reasoning capability elicitation and creating new proofs and repair data within existing repositories. This approach enables language models to both synthesize and repair proofs for function- and repository-level code. We show that our fine-tuned 14B parameter model, PoPilot, can exceed the performance of the models that outperforms GPT-4o in project-level proof-oriented programming by 64% relative margin, and can improve GPT-4o's performance by 54% by repairing its outputs over GPT-4o's self-repair.

构建一个以证明为导向的程序员，比GPT-4o在数据稀缺情况下提高了64%

Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity

摘要

Summary

Support