關於多模態語言模型在醫學影像中的組合泛化
On the Compositional Generalization of Multimodal LLMs for Medical Imaging
December 28, 2024
作者: Zhenyang Cai, Junying Chen, Rongsheng Wang, Weihong Wang, Yonglin Deng, Dingjie Song, Yize Chen, Zixu Zhang, Benyou Wang
cs.AI
摘要
多模式大型語言模型(MLLMs)在醫學領域具有重要潛力,但其能力常常受特定醫學領域數據不足的限制,凸顯了需要了解MLLMs可以用於泛化的何種類型圖像。目前的研究表明,多任務訓練優於單任務,因為不同任務可以互相受益,但它們常常忽略這些任務內部關係,對於選擇增強特定任務的數據集提供有限指導。為了分析這一現象,我們嘗試採用組合泛化(CG)-模型理解通過重新組合學習元素來理解新組合的能力-作為指導框架。由於醫學圖像可以通過模態、解剖區域和任務精確定義,自然提供了一個探索CG的環境。因此,我們組合了106個醫學數據集來創建Med-MAT進行全面實驗。實驗證實了MLLMs可以使用CG來理解看不見的醫學圖像,並將CG識別為多任務訓練觀察到的泛化的主要驅動因素之一。此外,進一步研究表明,CG有效支持數據有限的數據集,並在不同骨幹上提供一致的性能,凸顯了其多功能性和廣泛應用性。Med-MAT可在https://github.com/FreedomIntelligence/Med-MAT 上公開獲得。
English
Multimodal large language models (MLLMs) hold significant potential in the
medical field, but their capabilities are often limited by insufficient data in
certain medical domains, highlighting the need for understanding what kinds of
images can be used by MLLMs for generalization. Current research suggests that
multi-task training outperforms single-task as different tasks can benefit each
other, but they often overlook the internal relationships within these tasks,
providing limited guidance on selecting datasets to enhance specific tasks. To
analyze this phenomenon, we attempted to employ compositional generalization
(CG)-the ability of models to understand novel combinations by recombining
learned elements-as a guiding framework. Since medical images can be precisely
defined by Modality, Anatomical area, and Task, naturally providing an
environment for exploring CG. Therefore, we assembled 106 medical datasets to
create Med-MAT for comprehensive experiments. The experiments confirmed that
MLLMs can use CG to understand unseen medical images and identified CG as one
of the main drivers of the generalization observed in multi-task training.
Additionally, further studies demonstrated that CG effectively supports
datasets with limited data and delivers consistent performance across different
backbones, highlighting its versatility and broad applicability. Med-MAT is
publicly available at https://github.com/FreedomIntelligence/Med-MAT.Summary
AI-Generated Summary