OpenRFT:適應性推理基礎模型用於特定領域任務的強化微調

OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning

December 22, 2024
作者: Yuxiang Zhang, Yuqi Yang, Jiangming Shu, Yuhang Wang, Jinlin Xiao, Jitao Sang
cs.AI

摘要

OpenAI 最近推出的強化微調(RFT)展示了推理基礎模型的潛力,並提供了一種超越簡單模式模仿的微調新範式。本技術報告介紹了 OpenRFT,我們試圖在與 RFT 相同的設置下,對通用推理模型進行領域特定任務的微調。OpenRFT 通過三種方式利用領域特定樣本來應對缺乏推理步驟數據和有限的訓練樣本數量這兩個關鍵挑戰:問題擴增、合成推理過程數據和少樣本 ICL。在 SciKnowEval 上進行評估,OpenRFT 在每個任務僅使用 100 個領域特定樣本就實現了顯著的性能提升。更多實驗結果將在後續版本中持續更新。源代碼、數據集和模型可在以下網址找到:https://github.com/ADaM-BJTU/OpenRFT
English
OpenAI's recent introduction of Reinforcement Fine-Tuning (RFT) showcases the potential of reasoning foundation model and offers a new paradigm for fine-tuning beyond simple pattern imitation. This technical report presents OpenRFT, our attempt to fine-tune generalist reasoning models for domain-specific tasks under the same settings as RFT. OpenRFT addresses two key challenges of lacking reasoning step data and the limited quantity of training samples, by leveraging the domain-specific samples in three ways: question augmentation, synthesizing reasoning-process data, and few-shot ICL. The evaluation is conducted on SciKnowEval, where OpenRFT achieves notable performance gains with only 100 domain-specific samples for each task. More experimental results will be updated continuously in later versions. Source codes, datasets, and models are disclosed at: https://github.com/ADaM-BJTU/OpenRFT

Summary

AI-Generated Summary

PDF92December 24, 2024