데이터 희귀성 하에서 GPT-4o보다 64% 더 나은 증명 중심 프로그래머 구축하기

초록

기존 언어 모델은 데이터 부족으로 증명 중심 프로그래밍에 어려움을 겪고 있습니다. 이는 두 가지 주요 방식으로 나타납니다: (1) F*와 같은 증명 중심 프로그래밍 언어에 충분한 말뭉치의 부족, (2) 모델이 증명 중심 프로그래밍을 수행할 때 복잡한 추론 과정을 학습할 수 있는 대규모 프로젝트 수준의 증명 중심 구현의 부재. 우리는 프로젝트 수준 증명 중심 프로그래밍을 위한 합성 데이터 증강에 대한 첫 번째 방법을 제시합니다. 우리의 방법은 해당 언어에 능숙해지기 위해 기본 증명 중심 프로그래밍 문제를 합성함으로써 데이터 부족 문제를 해결하며, 추론 능력 유도를 위해 다양한 코딩 데이터를 통합하고 기존 저장소 내에서 새로운 증명과 수리 데이터를 생성합니다. 이 접근법은 언어 모델이 함수 및 저장소 수준 코드에 대해 증명을 합성하고 수리할 수 있게 합니다. 우리는 세밀하게 조정된 14B 매개변수 모델인 PoPilot이 프로젝트 수준 증명 중심 프로그래밍에서 GPT-4o를 64% 상대적으로 능가하는 모델의 성능을 보여주며, GPT-4o의 성능을 54% 향상시키는 것으로 나타냅니다. 이는 GPT-4o의 자가 수리를 통해 GPT-4o의 출력을 수리함으로써 달성됩니다.

English

Existing LMs struggle with proof-oriented programming due to data scarcity, which manifest in two key ways: (1) a lack of sufficient corpora for proof-oriented programming languages such as F*, and (2) the absence of large-scale, project-level proof-oriented implementations that can teach the model the intricate reasoning process when performing proof-oriented programming. We present the first on synthetic data augmentation for project level proof oriented programming for both generation and repair. Our method addresses data scarcity by synthesizing basic proof-oriented programming problems for proficiency in that language; incorporating diverse coding data for reasoning capability elicitation and creating new proofs and repair data within existing repositories. This approach enables language models to both synthesize and repair proofs for function- and repository-level code. We show that our fine-tuned 14B parameter model, PoPilot, can exceed the performance of the models that outperforms GPT-4o in project-level proof-oriented programming by 64% relative margin, and can improve GPT-4o's performance by 54% by repairing its outputs over GPT-4o's self-repair.

데이터 희귀성 하에서 GPT-4o보다 64% 더 나은 증명 중심 프로그래머 구축하기

Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity

초록

Summary

Support