ChatPaper.aiChatPaper

HyperAgent:通用軟體工程代理人以解決規模化編碼任務

HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

September 9, 2024
作者: Huy Nhat Phan, Phong X. Nguyen, Nghi D. Q. Bui
cs.AI

摘要

大型語言模型(LLMs)已經在軟體工程(SE)領域引起了革命,展現出在各種編碼任務中的卓越能力。儘管最近的努力已經開發出基於LLMs的自主軟體代理,用於端對端開發任務,但這些系統通常是針對特定的SE任務而設計的。我們介紹了HyperAgent,這是一種新穎的通用多代理系統,旨在通過模仿人類開發者的工作流程,解決不同程式語言的廣泛SE任務。HyperAgent由四個專業代理組成 - 計畫者、導航者、程式碼編輯器和執行者。HyperAgent管理SE任務的完整生命週期,從最初構想到最終驗證。通過廣泛的評估,HyperAgent在各種SE任務中實現了最先進的性能:在GitHub問題解決中,它在SWE-Bench-Lite上達到了25.01%的成功率,並在SWE-Bench-Verified上達到了31.40%,超越了現有方法。此外,HyperAgent在存儲庫級別的程式碼生成(RepoExec)以及故障定位和程式修復(Defects4J)方面展現了最先進的性能,通常優於專門的系統。這項工作代表了朝著能夠處理各種領域和語言中複雜的多步SE任務的多才多藝、自主代理邁進了一大步,有可能轉變AI輔助軟體開發實踐。
English
Large Language Models (LLMs) have revolutionized software engineering (SE), demonstrating remarkable capabilities in various coding tasks. While recent efforts have produced autonomous software agents based on LLMs for end-to-end development tasks, these systems are typically designed for specific SE tasks. We introduce HyperAgent, a novel generalist multi-agent system designed to address a wide spectrum of SE tasks across different programming languages by mimicking human developers' workflows. Comprising four specialized agents - Planner, Navigator, Code Editor, and Executor. HyperAgent manages the full lifecycle of SE tasks, from initial conception to final verification. Through extensive evaluations, HyperAgent achieves state-of-the-art performance across diverse SE tasks: it attains a 25.01% success rate on SWE-Bench-Lite and 31.40% on SWE-Bench-Verified for GitHub issue resolution, surpassing existing methods. Furthermore, HyperAgent demonstrates SOTA performance in repository-level code generation (RepoExec), and in fault localization and program repair (Defects4J), often outperforming specialized systems. This work represents a significant advancement towards versatile, autonomous agents capable of handling complex, multi-step SE tasks across various domains and languages, potentially transforming AI-assisted software development practices.

Summary

AI-Generated Summary

PDF122November 16, 2024