HuatuoGPT-o1,朝向具備醫學複雜推理能力的LLMs

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

December 25, 2024
作者: Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng Wang, Jianye Hou, Benyou Wang
cs.AI

摘要

OpenAI o1的突破凸顯了增強推理以改進LLM的潛力。然而,大多數推理研究集中在數學任務上,使醫學等領域被忽視。醫學領域雖然與數學不同,但也需要堅固的推理能力以提供可靠答案,考慮到醫療保健的高標準。然而,驗證醫學推理是具有挑戰性的,不像數學中那樣容易。為了解決這個問題,我們提出了具有醫學驗證器的可驗證醫學問題,以檢查模型輸出的正確性。這種可驗證的特性通過兩階段方法促進醫學推理的進步:(1) 使用驗證器引導尋找複雜推理軌跡以微調LLM,(2) 應用強化學習(RL)與基於驗證器的獎勵來進一步增強複雜推理。最後,我們介紹了HuatuoGPT-o1,一個能夠進行複雜推理的醫學LLM,僅使用40K個可驗證問題就超越了一般和醫學特定基準。實驗表明,複雜推理改進了醫學問題解決能力,並且更多地受益於RL。我們希望我們的方法能激發醫學和其他專業領域推理的進步。
English
The breakthrough of OpenAI o1 highlights the potential of enhancing reasoning to improve LLM. Yet, most research in reasoning has focused on mathematical tasks, leaving domains like medicine underexplored. The medical domain, though distinct from mathematics, also demands robust reasoning to provide reliable answers, given the high standards of healthcare. However, verifying medical reasoning is challenging, unlike those in mathematics. To address this, we propose verifiable medical problems with a medical verifier to check the correctness of model outputs. This verifiable nature enables advancements in medical reasoning through a two-stage approach: (1) using the verifier to guide the search for a complex reasoning trajectory for fine-tuning LLMs, (2) applying reinforcement learning (RL) with verifier-based rewards to enhance complex reasoning further. Finally, we introduce HuatuoGPT-o1, a medical LLM capable of complex reasoning, which outperforms general and medical-specific baselines using only 40K verifiable problems. Experiments show complex reasoning improves medical problem-solving and benefits more from RL. We hope our approach inspires advancements in reasoning across medical and other specialized domains.

Summary

AI-Generated Summary

PDF946December 30, 2024