MedVLM-R1:通过强化学习提升视觉-语言模型(VLMs)的医疗推理能力
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
February 26, 2025
作者: Jiazhen Pan, Che Liu, Junde Wu, Fenglin Liu, Jiayuan Zhu, Hongwei Bran Li, Chen Chen, Cheng Ouyang, Daniel Rueckert
cs.AI
摘要
推理是推动医学影像分析发展的关键前沿领域,其透明度和可信度在临床医生信任和监管审批中扮演着核心角色。尽管医学视觉语言模型(VLMs)在放射学任务中展现出潜力,但现有大多数VLMs仅生成最终答案,而未揭示其背后的推理过程。为填补这一空白,我们推出了MedVLM-R1,这是一种能够明确生成自然语言推理的医学VLM,旨在增强透明度和可信度。不同于依赖监督微调(SFT)——该方法常因过度拟合训练分布而无法促进真正的推理——MedVLM-R1采用强化学习框架,激励模型在不使用任何推理参考的情况下发现人类可理解的推理路径。尽管训练数据有限(600个视觉问答样本)且模型参数较少(20亿),MedVLM-R1在MRI、CT和X射线基准测试中的准确率从55.11%提升至78.22%,超越了基于百万级样本训练的大型模型。此外,在分布外任务下,它也展现了强大的领域泛化能力。通过将医学影像分析与显式推理相结合,MedVLM-R1标志着临床实践中迈向可信赖与可解释AI的关键一步。
English
Reasoning is a critical frontier for advancing medical image analysis, where
transparency and trustworthiness play a central role in both clinician trust
and regulatory approval. Although Medical Visual Language Models (VLMs) show
promise for radiological tasks, most existing VLMs merely produce final answers
without revealing the underlying reasoning. To address this gap, we introduce
MedVLM-R1, a medical VLM that explicitly generates natural language reasoning
to enhance transparency and trustworthiness. Instead of relying on supervised
fine-tuning (SFT), which often suffers from overfitting to training
distributions and fails to foster genuine reasoning, MedVLM-R1 employs a
reinforcement learning framework that incentivizes the model to discover
human-interpretable reasoning paths without using any reasoning references.
Despite limited training data (600 visual question answering samples) and model
parameters (2B), MedVLM-R1 boosts accuracy from 55.11% to 78.22% across MRI,
CT, and X-ray benchmarks, outperforming larger models trained on over a million
samples. It also demonstrates robust domain generalization under
out-of-distribution tasks. By unifying medical image analysis with explicit
reasoning, MedVLM-R1 marks a pivotal step toward trustworthy and interpretable
AI in clinical practice.Summary
AI-Generated Summary