ChatPaper.aiChatPaper

利用知识描述增强视觉语言模型的异常定位能力

Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions

March 5, 2025
作者: Jun Li, Che Liu, Wenjia Bai, Rossella Arcucci, Cosmin I. Bercea, Julia A. Schnabel
cs.AI

摘要

视觉语言模型(VLMs)在视觉定位任务中展现了卓越的能力。然而,其在医疗领域,尤其是针对医学图像中的异常检测与定位的有效性,仍待深入探索。主要挑战在于医学术语的复杂性和抽象性,这使得直接将病理异常术语与其对应的视觉特征关联起来变得困难。在本研究中,我们提出了一种新颖的方法,通过利用分解的医学知识来增强VLM在医疗异常检测与定位中的表现。我们不再直接提示模型识别特定异常,而是专注于将医学概念分解为基本属性和常见视觉模式。这一策略促进了文本描述与视觉特征之间更强的对齐,从而提升了医学图像中异常的识别与定位能力。我们在0.23B的Florence-2基础模型上评估了该方法,结果表明,尽管仅使用了此类模型所需数据的1.5%进行训练,其在异常定位上的表现与显著更大的7B基于LLaVA的医疗VLMs相当。实验结果还证明了我们的方法在已知及先前未见异常上的有效性,表明其具备强大的泛化能力。
English
Visual Language Models (VLMs) have demonstrated impressive capabilities in visual grounding tasks. However, their effectiveness in the medical domain, particularly for abnormality detection and localization within medical images, remains underexplored. A major challenge is the complex and abstract nature of medical terminology, which makes it difficult to directly associate pathological anomaly terms with their corresponding visual features. In this work, we introduce a novel approach to enhance VLM performance in medical abnormality detection and localization by leveraging decomposed medical knowledge. Instead of directly prompting models to recognize specific abnormalities, we focus on breaking down medical concepts into fundamental attributes and common visual patterns. This strategy promotes a stronger alignment between textual descriptions and visual features, improving both the recognition and localization of abnormalities in medical images.We evaluate our method on the 0.23B Florence-2 base model and demonstrate that it achieves comparable performance in abnormality grounding to significantly larger 7B LLaVA-based medical VLMs, despite being trained on only 1.5% of the data used for such models. Experimental results also demonstrate the effectiveness of our approach in both known and previously unseen abnormalities, suggesting its strong generalization capabilities.

Summary

AI-Generated Summary

PDF122March 6, 2025