HuatuoGPT-o1，面向医学复杂推理的大型语言模型(Large Language Models)

摘要

OpenAI o1的突破突显了增强推理以改进LLM的潜力。然而，大多数推理研究集中在数学任务上，而将医学等领域置于不足之地。虽然医学领域与数学有所不同，但也需要强大的推理能力以提供可靠答案，鉴于医疗保健的高标准。然而，验证医学推理是具有挑战性的，不像数学中那样简单。为了解决这个问题，我们提出了具有医学验证器的可验证医学问题，以检查模型输出的正确性。这种可验证的特性通过两阶段方法推动医学推理的进展：（1）使用验证器指导对复杂推理轨迹的搜索，以微调LLM，（2）应用基于验证器奖励的强化学习（RL）进一步增强复杂推理。最后，我们介绍了HuatuoGPT-o1，一个能够进行复杂推理的医学LLM，仅使用40K个可验证问题就能胜过一般和医学特定基准。实验表明，复杂推理改善了医学问题解决，并且更多地受益于RL。我们希望我们的方法能激发医学和其他专业领域推理的进步。

English

The breakthrough of OpenAI o1 highlights the potential of enhancing reasoning to improve LLM. Yet, most research in reasoning has focused on mathematical tasks, leaving domains like medicine underexplored. The medical domain, though distinct from mathematics, also demands robust reasoning to provide reliable answers, given the high standards of healthcare. However, verifying medical reasoning is challenging, unlike those in mathematics. To address this, we propose verifiable medical problems with a medical verifier to check the correctness of model outputs. This verifiable nature enables advancements in medical reasoning through a two-stage approach: (1) using the verifier to guide the search for a complex reasoning trajectory for fine-tuning LLMs, (2) applying reinforcement learning (RL) with verifier-based rewards to enhance complex reasoning further. Finally, we introduce HuatuoGPT-o1, a medical LLM capable of complex reasoning, which outperforms general and medical-specific baselines using only 40K verifiable problems. Experiments show complex reasoning improves medical problem-solving and benefits more from RL. We hope our approach inspires advancements in reasoning across medical and other specialized domains.

HuatuoGPT-o1，面向医学复杂推理的大型语言模型(Large Language Models)

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

摘要

Summary

Support