OpenRFT:为领域特定任务调整推理基础模型的强化微调
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
December 22, 2024
作者: Yuxiang Zhang, Yuqi Yang, Jiangming Shu, Yuhang Wang, Jinlin Xiao, Jitao Sang
cs.AI
摘要
OpenAI最近推出的强化微调(RFT)展示了推理基础模型的潜力,并为超越简单模式模仿的微调提供了一种新范式。本技术报告介绍了OpenRFT,我们尝试在与RFT相同的设置下为特定领域任务微调通用推理模型。OpenRFT解决了缺乏推理步骤数据和训练样本数量有限的两个关键挑战,通过三种方式利用特定领域样本:问题增强、合成推理过程数据和少样本ICL。评估在SciKnowEval上进行,OpenRFT仅使用每个任务100个特定领域样本就取得了显著的性能提升。更多实验结果将在后续版本中持续更新。源代码、数据集和模型可在以下链接找到:https://github.com/ADaM-BJTU/OpenRFT
English
OpenAI's recent introduction of Reinforcement Fine-Tuning (RFT) showcases the
potential of reasoning foundation model and offers a new paradigm for
fine-tuning beyond simple pattern imitation. This technical report presents
OpenRFT, our attempt to fine-tune generalist reasoning models for
domain-specific tasks under the same settings as RFT. OpenRFT addresses two key
challenges of lacking reasoning step data and the limited quantity of training
samples, by leveraging the domain-specific samples in three ways: question
augmentation, synthesizing reasoning-process data, and few-shot ICL. The
evaluation is conducted on SciKnowEval, where OpenRFT achieves notable
performance gains with only 100 domain-specific samples for each task. More
experimental results will be updated continuously in later versions. Source
codes, datasets, and models are disclosed at:
https://github.com/ADaM-BJTU/OpenRFTSummary
AI-Generated Summary