GMAI-VL 和 GMAI-VL-5.5M:一种大型视觉-语言模型和一个面向通用医疗人工智能的综合多模态数据集
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
November 21, 2024
作者: Tianbin Li, Yanzhou Su, Wei Li, Bin Fu, Zhe Chen, Ziyan Huang, Guoan Wang, Chenglong Ma, Ying Chen, Ming Hu, Yanjun Li, Pengcheng Chen, Xiaowei Hu, Zhongying Deng, Yuanfeng Ji, Jin Ye, Yu Qiao, Junjun He
cs.AI
摘要
尽管在通用人工智能方面取得了重大进展,例如GPT-4,但由于缺乏专业医学知识,它们在医疗领域(通用医学人工智能,GMAI)的有效性仍受限。为了解决这一挑战,我们提出了GMAI-VL-5.5M,这是一个全面的多模态医学数据集,通过将数百个专业医学数据集转换为精心构建的图像-文本对而创建。该数据集具有全面的任务覆盖范围、多样的模态和高质量的图像文本数据。基于这一多模态数据集,我们提出了GMAI-VL,这是一个通用医学视觉-语言模型,采用逐渐三阶段训练策略。这种方法通过整合视觉和文本信息显著增强了模型的能力,从而提高了处理多模态数据和支持准确诊断和临床决策的能力。实验评估表明,GMAI-VL在广泛的多模态医学任务中取得了最先进的结果,例如视觉问题回答和医学图像诊断。我们的贡献包括开发了GMAI-VL-5.5M数据集,介绍了GMAI-VL模型,并在多个医学领域建立了新的基准。代码和数据集将在https://github.com/uni-medical/GMAI-VL发布。
English
Despite significant advancements in general artificial intelligence, such as
GPT-4, their effectiveness in the medical domain (general medical AI, GMAI)
remains constrained due to the absence of specialized medical knowledge. To
address this challenge, we present GMAI-VL-5.5M, a comprehensive multimodal
medical dataset created by converting hundreds of specialized medical datasets
into meticulously constructed image-text pairs. This dataset features
comprehensive task coverage, diverse modalities, and high-quality image-text
data. Building upon this multimodal dataset, we propose GMAI-VL, a general
medical vision-language model with a progressively three-stage training
strategy. This approach significantly enhances the model's ability by
integrating visual and textual information, thereby improving its ability to
process multimodal data and support accurate diagnosis and clinical
decision-making. Experimental evaluations demonstrate that GMAI-VL achieves
state-of-the-art results across a wide range of multimodal medical tasks, such
as visual question answering and medical image diagnosis. Our contributions
include the development of the GMAI-VL-5.5M dataset, the introduction of the
GMAI-VL model, and the establishment of new benchmarks in multiple medical
domains. Code and dataset will be released at
https://github.com/uni-medical/GMAI-VL.Summary
AI-Generated Summary