BiMediX2:用于多种医疗模态的生物医学专家LMM

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

December 10, 2024
作者: Sahal Shaji Mullappilly, Mohammed Irfan Kurpath, Sara Pieri, Saeed Yahya Alseiari, Shanavas Cholakkal, Khaled Aldahmani, Fahad Khan, Rao Anwer, Salman Khan, Timothy Baldwin, Hisham Cholakkal
cs.AI

摘要

本文介绍了BiMediX2,这是一个双语(阿拉伯语-英语)的生物医学专家大型多模态模型(LMM),具有统一的架构,集成了文本和视觉模态,实现了先进的图像理解和医学应用。BiMediX2利用了Llama3.1架构,并整合了文本和视觉功能,以促进在英语和阿拉伯语中的无缝交互,支持基于文本的输入和涉及医学图像的多轮对话。该模型在一个包含160万个样本的广泛双语医疗保健数据集上进行训练,涵盖了各种医学互动的文本和图像模态,混合了阿拉伯语和英语。我们还提出了第一个基于双语GPT-4o的医学LMM基准,名为BiMed-MBench。BiMediX2在基于文本和基于图像的任务上进行基准测试,在几个医学基准测试中取得了最先进的性能。它在医学LLM评估基准测试中胜过了最近的最先进模型。我们的模型还在多模态医学评估中设立了新的基准,英语评估提高了超过9%,阿拉伯语评估提高了超过20%。此外,在UPHILL事实准确性评估中,它超过了GPT-4约9%,在各种医学视觉问答、报告生成和报告摘要任务中表现出色。项目页面包括源代码和训练模型,可在https://github.com/mbzuai-oryx/BiMediX2 上找到。
English
This paper introduces BiMediX2, a bilingual (Arabic-English) Bio-Medical EXpert Large Multimodal Model (LMM) with a unified architecture that integrates text and visual modalities, enabling advanced image understanding and medical applications. BiMediX2 leverages the Llama3.1 architecture and integrates text and visual capabilities to facilitate seamless interactions in both English and Arabic, supporting text-based inputs and multi-turn conversations involving medical images. The model is trained on an extensive bilingual healthcare dataset consisting of 1.6M samples of diverse medical interactions for both text and image modalities, mixed in Arabic and English. We also propose the first bilingual GPT-4o based medical LMM benchmark named BiMed-MBench. BiMediX2 is benchmarked on both text-based and image-based tasks, achieving state-of-the-art performance across several medical benchmarks. It outperforms recent state-of-the-art models in medical LLM evaluation benchmarks. Our model also sets a new benchmark in multimodal medical evaluations with over 9% improvement in English and over 20% in Arabic evaluations. Additionally, it surpasses GPT-4 by around 9% in UPHILL factual accuracy evaluations and excels in various medical Visual Question Answering, Report Generation, and Report Summarization tasks. The project page including source code and the trained model, is available at https://github.com/mbzuai-oryx/BiMediX2.

Summary

AI-Generated Summary

PDF262December 16, 2024