CODESIM：通过模拟驱动的规划和调试进行多智能体代码生成和问题解决

摘要

大型语言模型（LLMs）在代码生成和问题解决方面取得了重大进展。当前的方法采用基于外部工具的迭代调试器，利用编译器或其他基于工具的运行时反馈来优化各种方法生成的粗糙程序。然而，这些方法的有效性在很大程度上取决于初始代码生成的质量，这仍然是一个挑战。在本文中，我们介绍了CodeSim，这是一个新颖的多智能体代码生成框架，通过类似人类感知的方法全面解决了程序合成的规划、编码和调试阶段。正如人类通过视觉模拟验证他们对任何算法的理解一样，CodeSim独特地具有一种通过逐步模拟输入/输出的计划验证和内部调试方法。在七个具有挑战性的竞争性问题解决和程序合成基准测试中进行的大量实验表明了CodeSim卓越的代码生成能力。我们的框架取得了新的最先进（一次通过）结果（HumanEval 95.1％，MBPP 90.7％，APPS 22％和CodeContests 29.1％）。此外，我们的方法显示出与外部调试器级联时进一步增强的潜力。为了促进这一领域的进一步研究和发展，我们已在以下链接（https://kagnlp.github.io/codesim.github.io/）上开源了我们的框架。

English

Large Language Models (LLMs) have made significant strides in code generation and problem solving. Current approaches employ external tool-based iterative debuggers that use compiler or other tool-based runtime feedback to refine coarse programs generated by various methods. However, the effectiveness of these approaches heavily relies on the quality of the initial code generation, which remains an open challenge. In this paper, we introduce CodeSim, a novel multi-agent code generation framework that comprehensively addresses the stages of program synthesis-planning, coding, and debugging-through a human-like perception approach. As human verifies their understanding of any algorithms through visual simulation, CodeSim uniquely features a method of plan verification and internal debugging through the step-by-step simulation of input/output. Extensive experiments across seven challenging competitive problem-solving and program synthesis benchmarks demonstrate CodeSim's remarkable code generation capabilities. Our framework achieves new state-of-the-art (pass@1) results-(HumanEval 95.1%, MBPP 90.7%, APPS 22%, and CodeContests 29.1%). Furthermore, our method shows potential for even greater enhancement when cascaded with external debuggers. To facilitate further research and development in this area, we have open-sourced our framework in this link (https://kagnlp.github.io/codesim.github.io/).

CODESIM：通过模拟驱动的规划和调试进行多智能体代码生成和问题解决

CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging

摘要

Summary

Support