关于医学影像的多模态LLM的组合泛化
On the Compositional Generalization of Multimodal LLMs for Medical Imaging
December 28, 2024
作者: Zhenyang Cai, Junying Chen, Rongsheng Wang, Weihong Wang, Yonglin Deng, Dingjie Song, Yize Chen, Zixu Zhang, Benyou Wang
cs.AI
摘要
多模态大型语言模型(MLLMs)在医学领域具有重要潜力,但其能力常常受到某些医学领域数据不足的限制,突显了需要了解MLLMs可以用什么类型的图像进行泛化的必要性。当前研究表明,多任务训练优于单一任务,因为不同任务可以相互受益,但它们常常忽视这些任务内部的关系,提供有限的指导以选择增强特定任务的数据集。为了分析这一现象,我们尝试使用组合泛化(CG)——模型理解通过重新组合学习元素而形成的新组合的能力——作为指导框架。由于医学图像可以通过模态、解剖区域和任务精确定义,自然地提供了一个探索CG的环境。因此,我们汇集了106个医学数据集创建Med-MAT进行全面实验。实验证实,MLLMs可以利用CG理解未见过的医学图像,并确定CG是多任务训练中观察到的泛化的主要驱动因素之一。此外,进一步研究表明,CG有效地支持数据有限的数据集,并在不同的骨干结构上提供一致的性能,突显了其多功能性和广泛适用性。Med-MAT可在https://github.com/FreedomIntelligence/Med-MAT 公开获取。
English
Multimodal large language models (MLLMs) hold significant potential in the
medical field, but their capabilities are often limited by insufficient data in
certain medical domains, highlighting the need for understanding what kinds of
images can be used by MLLMs for generalization. Current research suggests that
multi-task training outperforms single-task as different tasks can benefit each
other, but they often overlook the internal relationships within these tasks,
providing limited guidance on selecting datasets to enhance specific tasks. To
analyze this phenomenon, we attempted to employ compositional generalization
(CG)-the ability of models to understand novel combinations by recombining
learned elements-as a guiding framework. Since medical images can be precisely
defined by Modality, Anatomical area, and Task, naturally providing an
environment for exploring CG. Therefore, we assembled 106 medical datasets to
create Med-MAT for comprehensive experiments. The experiments confirmed that
MLLMs can use CG to understand unseen medical images and identified CG as one
of the main drivers of the generalization observed in multi-task training.
Additionally, further studies demonstrated that CG effectively supports
datasets with limited data and delivers consistent performance across different
backbones, highlighting its versatility and broad applicability. Med-MAT is
publicly available at https://github.com/FreedomIntelligence/Med-MAT.Summary
AI-Generated Summary