Paper2Code：自動化從機器學習科學論文中生成代碼

摘要

儘管機器學習研究迅速發展，相應的代碼實現往往難以獲取，這使得研究人員在重現結果和基於前人工作進行構建時，過程緩慢且耗費大量人力。與此同時，近期的大型語言模型（LLMs）在理解科學文檔和生成高質量代碼方面表現卓越。受此啟發，我們推出了PaperCoder，這是一個多代理LLM框架，旨在將機器學習論文轉化為功能性的代碼庫。PaperCoder運作分為三個階段：規劃階段，構建高層次路線圖，設計系統架構並繪製圖表，識別文件依賴關係並生成配置文件；分析階段，專注於解讀實現細節；以及生成階段，產出模塊化、考慮依賴關係的代碼。此外，每個階段都通過一系列專門設計的代理來實現，這些代理在整個流程中高效協作。我們隨後基於模型評估和人工評估（特別是來自原始論文作者的評價），以作者發布的代碼庫作為基準（如果可用的話），對PaperCoder從機器學習論文生成代碼實現的能力進行了評估。我們的結果證明了PaperCoder在創建高質量、忠實的實現方面的有效性。此外，在最新發布的PaperBench基準測試中，PaperCoder持續展現優勢，以顯著優勢超越強勁的基線模型。

English

Despite the rapid growth of machine learning research, corresponding code implementations are often unavailable, making it slow and labor-intensive for researchers to reproduce results and build upon prior work. In the meantime, recent Large Language Models (LLMs) excel at understanding scientific documents and generating high-quality code. Inspired by this, we introduce PaperCoder, a multi-agent LLM framework that transforms machine learning papers into functional code repositories. PaperCoder operates in three stages: planning, where it constructs a high-level roadmap, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files; analysis, which focuses on interpreting implementation-specific details; and generation, where modular, dependency-aware code is produced. Moreover, each phase is instantiated through a set of specialized agents designed to collaborate effectively across the pipeline. We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations, specifically from the original paper authors, with author-released repositories as ground truth if available. Our results demonstrate the effectiveness of PaperCoder in creating high-quality, faithful implementations. Furthermore, it consistently shows strengths in the recently released PaperBench benchmark, surpassing strong baselines by substantial margins.

Paper2Code：自動化從機器學習科學論文中生成代碼

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

摘要

Summary

Support

Support