Paper2Code:自動化從機器學習科學論文中生成代碼
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
April 24, 2025
作者: Minju Seo, Jinheon Baek, Seongyun Lee, Sung Ju Hwang
cs.AI
摘要
儘管機器學習研究迅速發展,相應的代碼實現往往難以獲取,這使得研究人員在重現結果和基於前人工作進行構建時,過程緩慢且耗費大量人力。與此同時,近期的大型語言模型(LLMs)在理解科學文檔和生成高質量代碼方面表現卓越。受此啟發,我們推出了PaperCoder,這是一個多代理LLM框架,旨在將機器學習論文轉化為功能性的代碼庫。PaperCoder運作分為三個階段:規劃階段,構建高層次路線圖,設計系統架構並繪製圖表,識別文件依賴關係並生成配置文件;分析階段,專注於解讀實現細節;以及生成階段,產出模塊化、考慮依賴關係的代碼。此外,每個階段都通過一系列專門設計的代理來實現,這些代理在整個流程中高效協作。我們隨後基於模型評估和人工評估(特別是來自原始論文作者的評價),以作者發布的代碼庫作為基準(如果可用的話),對PaperCoder從機器學習論文生成代碼實現的能力進行了評估。我們的結果證明了PaperCoder在創建高質量、忠實的實現方面的有效性。此外,在最新發布的PaperBench基準測試中,PaperCoder持續展現優勢,以顯著優勢超越強勁的基線模型。
English
Despite the rapid growth of machine learning research, corresponding code
implementations are often unavailable, making it slow and labor-intensive for
researchers to reproduce results and build upon prior work. In the meantime,
recent Large Language Models (LLMs) excel at understanding scientific documents
and generating high-quality code. Inspired by this, we introduce PaperCoder, a
multi-agent LLM framework that transforms machine learning papers into
functional code repositories. PaperCoder operates in three stages: planning,
where it constructs a high-level roadmap, designs the system architecture with
diagrams, identifies file dependencies, and generates configuration files;
analysis, which focuses on interpreting implementation-specific details; and
generation, where modular, dependency-aware code is produced. Moreover, each
phase is instantiated through a set of specialized agents designed to
collaborate effectively across the pipeline. We then evaluate PaperCoder on
generating code implementations from machine learning papers based on both
model-based and human evaluations, specifically from the original paper
authors, with author-released repositories as ground truth if available. Our
results demonstrate the effectiveness of PaperCoder in creating high-quality,
faithful implementations. Furthermore, it consistently shows strengths in the
recently released PaperBench benchmark, surpassing strong baselines by
substantial margins.Summary
AI-Generated Summary