SWE-Fixer:訓練開源LLMs以提高GitHub問題解決的效率和效力
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution
January 9, 2025
作者: Chengxing Xie, Bowen Li, Chang Gao, He Du, Wai Lam, Difan Zou, Kai Chen
cs.AI
摘要
大型語言模型(LLMs)展示了在各種複雜任務上的卓越能力。LLMs的一個重要應用是應對軟體工程挑戰,特別是通過修復基於用戶報告的問題來解決GitHub上的實際任務。然而,許多當前方法依賴於專有的LLMs,這限制了可重現性、可訪問性和透明度。LLMs解決軟體工程問題的關鍵組件以及如何有效增強其能力仍不清楚。為應對這些挑戰,我們介紹了SWE-Fixer,這是一個新穎的開源LLM,旨在有效且高效地解決GitHub問題。SWE-Fixer包括兩個基本模塊:代碼文件檢索模塊和代碼編輯模塊。檢索模塊採用BM25以及一個輕量級LLM模型實現粗到細的文件檢索。隨後,代碼編輯模塊利用另一個LLM模型為識別的文件生成修補程式。然後,為了彌補公開可用數據集的不足,我們編制了一個包括110K GitHub問題及其相應修補程式的廣泛數據集,並分別訓練了SWE-Fixer的兩個模塊。我們在SWE-Bench Lite和Verified基準上評估我們的方法,分別取得了23.3%和30.2%的最先進性能。這些結果突顯了我們方法的功效。我們將在https://github.com/InternLM/SWE-Fixer 上公開提供我們的模型、數據集和代碼。
English
Large Language Models (LLMs) have demonstrated remarkable proficiency across
a variety of complex tasks. One significant application of LLMs is in tackling
software engineering challenges, particularly in resolving real-world tasks on
GitHub by fixing code based on the issues reported by the users. However, many
current approaches rely on proprietary LLMs, which limits reproducibility,
accessibility, and transparency. The critical components of LLMs for addressing
software engineering issues and how their capabilities can be effectively
enhanced remain unclear. To address these challenges, we introduce SWE-Fixer, a
novel open-source LLM designed to effectively and efficiently resolve GitHub
issues. SWE-Fixer comprises two essential modules: a code file retrieval module
and a code editing module. The retrieval module employs BM25 along with a
lightweight LLM model to achieve coarse-to-fine file retrieval. Subsequently,
the code editing module utilizes the other LLM model to generate patches for
the identified files. Then, to mitigate the lack of publicly available
datasets, we compile an extensive dataset that includes 110K GitHub issues
along with their corresponding patches, and train the two modules of SWE-Fixer
separately. We assess our approach on the SWE-Bench Lite and Verified
benchmarks, achieving state-of-the-art performance among open-source models
with scores of 23.3% and 30.2%, respectively. These outcomes highlight the
efficacy of our approach. We will make our model, dataset, and code publicly
available at https://github.com/InternLM/SWE-Fixer.Summary
AI-Generated Summary