GMAI-VL 和 GMAI-VL-5.5M:一個大型視覺語言模型及一個全面的多模態醫學人工智慧數據集
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
November 21, 2024
作者: Tianbin Li, Yanzhou Su, Wei Li, Bin Fu, Zhe Chen, Ziyan Huang, Guoan Wang, Chenglong Ma, Ying Chen, Ming Hu, Yanjun Li, Pengcheng Chen, Xiaowei Hu, Zhongying Deng, Yuanfeng Ji, Jin Ye, Yu Qiao, Junjun He
cs.AI
摘要
儘管在一般人工智慧領域取得了重大進展,例如 GPT-4,但其在醫學領域(一般醫學人工智慧,GMAI)的效果仍受限於缺乏專業醫學知識。為了應對這一挑戰,我們提出了 GMAI-VL-5.5M,這是一個全面的多模態醫學數據集,通過將數百個專業醫學數據集轉換為精心構建的圖像-文本對而創建而成。該數據集具有全面的任務覆蓋、多樣的模態和高質量的圖像-文本數據。基於這個多模態數據集,我們提出了 GMAI-VL,一個通用醫學視覺語言模型,採用漸進三階段訓練策略。這種方法通過整合視覺和文本信息顯著增強了模型的能力,從而提高了處理多模態數據並支持準確診斷和臨床決策的能力。實驗評估表明,GMAI-VL 在眾多多模態醫學任務中取得了最先進的結果,例如視覺問答和醫學影像診斷。我們的貢獻包括開發了 GMAI-VL-5.5M 數據集,介紹了 GMAI-VL 模型,並在多個醫學領域建立了新的基準。代碼和數據集將在 https://github.com/uni-medical/GMAI-VL 上發布。
English
Despite significant advancements in general artificial intelligence, such as
GPT-4, their effectiveness in the medical domain (general medical AI, GMAI)
remains constrained due to the absence of specialized medical knowledge. To
address this challenge, we present GMAI-VL-5.5M, a comprehensive multimodal
medical dataset created by converting hundreds of specialized medical datasets
into meticulously constructed image-text pairs. This dataset features
comprehensive task coverage, diverse modalities, and high-quality image-text
data. Building upon this multimodal dataset, we propose GMAI-VL, a general
medical vision-language model with a progressively three-stage training
strategy. This approach significantly enhances the model's ability by
integrating visual and textual information, thereby improving its ability to
process multimodal data and support accurate diagnosis and clinical
decision-making. Experimental evaluations demonstrate that GMAI-VL achieves
state-of-the-art results across a wide range of multimodal medical tasks, such
as visual question answering and medical image diagnosis. Our contributions
include the development of the GMAI-VL-5.5M dataset, the introduction of the
GMAI-VL model, and the establishment of new benchmarks in multiple medical
domains. Code and dataset will be released at
https://github.com/uni-medical/GMAI-VL.Summary
AI-Generated Summary