SoRFT：基于子任务导向的强化微调问题解决

摘要

主流问题解决框架主要依赖商用模型，导致高成本和隐私隐患。现有问题解决训练方法普遍存在泛化能力差的问题，且未能充分利用开源开发资源。我们提出了面向子任务的强化微调（SoRFT），这是一种新颖的训练方法，旨在提升大语言模型（LLMs）的问题解决能力。该方法将问题解决分解为结构化子任务：文件定位、函数定位、行定位及代码编辑生成。SoRFT包含两个训练阶段：（1）基于拒绝采样的监督微调，即在微调LLM前，使用真实数据过滤链式思维（CoT）数据；（2）基于规则的强化学习，利用近端策略优化（PPO）算法并结合真实数据奖励机制。我们在SWE-Bench Verified和SWE-Bench Lite数据集上评估了SoRFT训练后的模型，在开源模型中实现了最先进的（SOTA）性能（例如，SoRFT-Qwen-7B在SWE-Bench Verified上解决了21.4%的问题）。实验结果表明，SoRFT显著提升了问题解决性能，增强了模型泛化能力，并为商用模型提供了一种成本效益高的替代方案。

English

Mainstream issue-resolving frameworks predominantly rely on commercial models, leading to high costs and privacy concerns. Existing training approaches for issue resolving struggle with poor generalization and fail to fully leverage open-source development resources. We propose Subtask-oriented Reinforced Fine-Tuning (SoRFT), a novel training approach to enhance the issue resolving capability of LLMs. We decomposes issue resolving into structured subtasks: file localization, function localization, line localization, and code edit generation. SoRFT consists of two training stages: (1) rejection-sampled supervised fine-tuning, Chain of Thought (CoT) data is filtered using ground-truth before fine-tuning the LLM, and (2) rule-based reinforcement learning, which leverages PPO with ground-truth based rewards. We evaluate the SoRFT-trained model on SWE-Bench Verified and SWE-Bench Lite, achieving state-of-the-art (SOTA) performance among open-source models (e.g., resolve 21.4% issues on SWE-Bench Verified with SoRFT-Qwen-7B). The experimental results demonstrate that SoRFT significantly enhances issue-resolving performance, improves model generalization, and provides a cost-efficient alternative to commercial models.

SoRFT：基于子任务导向的强化微调问题解决

SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning

摘要

Summary

热门论文

1比特LLM时代：所有大型语言模型均为1.58比特。
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Qwen2.5 技术报告
Qwen2.5 Technical Report

DeepSeek-R1：通过强化学习激励LLMs中的推理能力
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Support

摘要

Summary

热门论文

1比特LLM时代：所有大型语言模型均为1.58比特。The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Qwen2.5 技术报告Qwen2.5 Technical Report

DeepSeek-R1：通过强化学习激励LLMs中的推理能力DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

1比特LLM时代：所有大型语言模型均为1.58比特。
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Qwen2.5 技术报告
Qwen2.5 Technical Report

DeepSeek-R1：通过强化学习激励LLMs中的推理能力
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning