ChatPaper.aiChatPaper

SilVar-Med:一种基于语音的可视化语言模型,用于医学影像中的可解释性异常检测

SilVar-Med: A Speech-Driven Visual Language Model for Explainable Abnormality Detection in Medical Imaging

April 14, 2025
作者: Tan-Hanh Pham, Chris Ngo, Trong-Duong Bui, Minh Luu Quang, Tan-Huong Pham, Truong-Son Hy
cs.AI

摘要

医疗视觉语言模型在多种医疗应用中展现出巨大潜力,包括医学图像描述和诊断辅助。然而,现有模型大多依赖文本指令,这限制了其在真实临床环境中的实用性,特别是在手术等场景下,基于文本的交互对医生而言往往不切实际。此外,当前的医学图像分析模型通常缺乏对其预测背后全面推理的解释,这降低了其在临床决策中的可靠性。鉴于医疗诊断错误可能带来改变人生的后果,开发可解释且理性的医疗辅助工具显得尤为迫切。为应对这些挑战,我们推出了一种端到端的语音驱动医疗视觉语言模型——SilVar-Med,这是一种多模态医学图像助手,它将语音交互与视觉语言模型相结合,开创了基于语音的医学图像分析任务。同时,我们通过提出的推理数据集,着重于对每次医学异常预测背后的推理进行解释。通过大量实验,我们展示了结合端到端语音交互的推理驱动医学图像解释的概念验证研究。我们相信,这项工作将通过促进更加透明、互动且临床可行的诊断支持系统,推动医疗人工智能领域的发展。我们的代码和数据集已在SiVar-Med平台公开。
English
Medical Visual Language Models have shown great potential in various healthcare applications, including medical image captioning and diagnostic assistance. However, most existing models rely on text-based instructions, limiting their usability in real-world clinical environments especially in scenarios such as surgery, text-based interaction is often impractical for physicians. In addition, current medical image analysis models typically lack comprehensive reasoning behind their predictions, which reduces their reliability for clinical decision-making. Given that medical diagnosis errors can have life-changing consequences, there is a critical need for interpretable and rational medical assistance. To address these challenges, we introduce an end-to-end speech-driven medical VLM, SilVar-Med, a multimodal medical image assistant that integrates speech interaction with VLMs, pioneering the task of voice-based communication for medical image analysis. In addition, we focus on the interpretation of the reasoning behind each prediction of medical abnormalities with a proposed reasoning dataset. Through extensive experiments, we demonstrate a proof-of-concept study for reasoning-driven medical image interpretation with end-to-end speech interaction. We believe this work will advance the field of medical AI by fostering more transparent, interactive, and clinically viable diagnostic support systems. Our code and dataset are publicly available at SiVar-Med.

Summary

AI-Generated Summary

PDF12April 22, 2025