HuatuoGPT-o1,面向医学复杂推理的大型语言模型(Large Language Models)
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
December 25, 2024
作者: Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng Wang, Jianye Hou, Benyou Wang
cs.AI
摘要
OpenAI o1的突破突显了增强推理以改进LLM的潜力。然而,大多数推理研究集中在数学任务上,而将医学等领域置于不足之地。虽然医学领域与数学有所不同,但也需要强大的推理能力以提供可靠答案,鉴于医疗保健的高标准。然而,验证医学推理是具有挑战性的,不像数学中那样简单。为了解决这个问题,我们提出了具有医学验证器的可验证医学问题,以检查模型输出的正确性。这种可验证的特性通过两阶段方法推动医学推理的进展:(1)使用验证器指导对复杂推理轨迹的搜索,以微调LLM,(2)应用基于验证器奖励的强化学习(RL)进一步增强复杂推理。最后,我们介绍了HuatuoGPT-o1,一个能够进行复杂推理的医学LLM,仅使用40K个可验证问题就能胜过一般和医学特定基准。实验表明,复杂推理改善了医学问题解决,并且更多地受益于RL。我们希望我们的方法能激发医学和其他专业领域推理的进步。
English
The breakthrough of OpenAI o1 highlights the potential of enhancing reasoning
to improve LLM. Yet, most research in reasoning has focused on mathematical
tasks, leaving domains like medicine underexplored. The medical domain, though
distinct from mathematics, also demands robust reasoning to provide reliable
answers, given the high standards of healthcare. However, verifying medical
reasoning is challenging, unlike those in mathematics. To address this, we
propose verifiable medical problems with a medical verifier to check the
correctness of model outputs. This verifiable nature enables advancements in
medical reasoning through a two-stage approach: (1) using the verifier to guide
the search for a complex reasoning trajectory for fine-tuning LLMs, (2)
applying reinforcement learning (RL) with verifier-based rewards to enhance
complex reasoning further. Finally, we introduce HuatuoGPT-o1, a medical LLM
capable of complex reasoning, which outperforms general and medical-specific
baselines using only 40K verifiable problems. Experiments show complex
reasoning improves medical problem-solving and benefits more from RL. We hope
our approach inspires advancements in reasoning across medical and other
specialized domains.Summary
AI-Generated Summary