GMAI-VL 및 GMAI-VL-5.5M: 일반 의료 AI를 향한 대규모 비전-언어 모델 및 포괄적 다중 모달 데이터셋

초록

일반적인 인공지능 분야에서의 중요한 발전이 있었음에도 불구하고, 예를 들어 GPT-4와 같은 것들이 의료 분야(일반 의료 인공지능, GMAI)에서는 전문적인 의료 지식의 부재로 인해 효과가 제한되어 있습니다. 이러한 도전에 대처하기 위해, 우리는 수백 개의 전문 의료 데이터셋을 세심하게 구축된 이미지-텍스트 쌍으로 변환하여 만든 포괄적인 다중 모달 의료 데이터셋인 GMAI-VL-5.5M을 제시합니다. 이 데이터셋은 포괄적인 작업 범위, 다양한 모달리티, 그리고 고품질의 이미지-텍스트 데이터를 특징으로 합니다. 이 다중 모달 데이터셋을 기반으로, 우리는 점진적으로 세 단계의 훈련 전략을 갖춘 일반 의료 비전-언어 모델인 GMAI-VL을 제안합니다. 이 방법은 시각적 및 텍스트 정보를 통합함으로써 모델의 능력을 크게 향상시키며, 다중 모달 데이터를 처리하고 정확한 진단 및 임상 의사 결정을 지원하는 능력을 향상시킵니다. 실험적 평가 결과, GMAI-VL이 시각적 질문 응답 및 의료 이미지 진단과 같은 다양한 다중 모달 의료 작업에서 최첨단 결과를 달성한다는 것을 보여줍니다. 우리의 기여에는 GMAI-VL-5.5M 데이터셋의 개발, GMAI-VL 모델의 소개, 그리고 여러 의료 분야에서 새로운 기준의 수립이 포함됩니다. 코드와 데이터셋은 https://github.com/uni-medical/GMAI-VL에서 공개될 예정입니다.

English

Despite significant advancements in general artificial intelligence, such as GPT-4, their effectiveness in the medical domain (general medical AI, GMAI) remains constrained due to the absence of specialized medical knowledge. To address this challenge, we present GMAI-VL-5.5M, a comprehensive multimodal medical dataset created by converting hundreds of specialized medical datasets into meticulously constructed image-text pairs. This dataset features comprehensive task coverage, diverse modalities, and high-quality image-text data. Building upon this multimodal dataset, we propose GMAI-VL, a general medical vision-language model with a progressively three-stage training strategy. This approach significantly enhances the model's ability by integrating visual and textual information, thereby improving its ability to process multimodal data and support accurate diagnosis and clinical decision-making. Experimental evaluations demonstrate that GMAI-VL achieves state-of-the-art results across a wide range of multimodal medical tasks, such as visual question answering and medical image diagnosis. Our contributions include the development of the GMAI-VL-5.5M dataset, the introduction of the GMAI-VL model, and the establishment of new benchmarks in multiple medical domains. Code and dataset will be released at https://github.com/uni-medical/GMAI-VL.

GMAI-VL 및 GMAI-VL-5.5M: 일반 의료 AI를 향한 대규모 비전-언어 모델 및 포괄적 다중 모달 데이터셋

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

초록

Summary

Support