大型语言模型引导的自我调试代码生成
Large Language Model Guided Self-Debugging Code Generation
February 5, 2025
作者: Muntasir Adnan, Zhiwei Xu, Carlos C. N. Kuhn
cs.AI
摘要
自动化代码生成在智能计算机编程和系统部署中变得越来越重要。然而,当前方法通常面临计算效率方面的挑战,缺乏对代码解析和错误校正的健壮机制。在这项工作中,我们提出了一个新颖的框架,PyCapsule,具有简单而有效的两代理管道和高效的自调试模块,用于Python代码生成。PyCapsule具有复杂的提示推断、迭代式错误处理和案例测试,确保高生成稳定性、安全性和正确性。在实证方面,与最先进的方法相比,PyCapsule在HumanEval上的成功率提高了高达5.7%,在HumanEval-ET上提高了10.3%,在BigCodeBench上提高了24.4%。我们还观察到,在进行更多自调试尝试的情况下,标准化成功率下降,可能受到保留中有限且嘈杂的错误反馈的影响。PyCapsule展示了在推进人工智能系统的轻量级和高效代码生成方面的广泛影响。
English
Automated code generation is gaining significant importance in intelligent
computer programming and system deployment. However, current approaches often
face challenges in computational efficiency and lack robust mechanisms for code
parsing and error correction. In this work, we propose a novel framework,
PyCapsule, with a simple yet effective two-agent pipeline and efficient
self-debugging modules for Python code generation. PyCapsule features
sophisticated prompt inference, iterative error handling, and case testing,
ensuring high generation stability, safety, and correctness. Empirically,
PyCapsule achieves up to 5.7% improvement of success rate on HumanEval, 10.3%
on HumanEval-ET, and 24.4% on BigCodeBench compared to the state-of-art
methods. We also observe a decrease in normalized success rate given more
self-debugging attempts, potentially affected by limited and noisy error
feedback in retention. PyCapsule demonstrates broader impacts on advancing
lightweight and efficient code generation for artificial intelligence systems.Summary
AI-Generated Summary