BiMediX2：針對多元醫學模式的生物醫學專家深度學習模型

摘要

本文介紹了BiMediX2，一個雙語（阿拉伯語-英語）生物醫學專家大型多模型（LMM），具有統一的架構，整合了文本和視覺模態，實現了先進的圖像理解和醫學應用。BiMediX2利用Llama3.1架構，整合了文本和視覺功能，以促進在英語和阿拉伯語中的無縫互動，支持基於文本的輸入和涉及醫學圖像的多輪對話。該模型在包含160萬個樣本的廣泛雙語醫療數據集上進行訓練，涵蓋了各種醫學互動的文本和圖像模態，混合了阿拉伯語和英語。我們還提出了第一個基於雙語GPT-4o的醫學LMM基準，名為BiMed-MBench。BiMediX2在基於文本和基於圖像的任務上進行了基準測試，在幾個醫學基準測試中實現了最先進的性能。它在醫學LLM評估基準中優於最近的最先進模型。我們的模型還在多模態醫學評估中設立了一個新的基準，英語評估提高了超過9％，阿拉伯語評估提高了超過20％。此外，它在UPHILL事實準確性評估中比GPT-4高約9％，在各種醫學視覺問答、報告生成和報告摘要任務中表現出色。項目頁面包括源代碼和訓練模型，可在https://github.com/mbzuai-oryx/BiMediX2找到。

English

This paper introduces BiMediX2, a bilingual (Arabic-English) Bio-Medical EXpert Large Multimodal Model (LMM) with a unified architecture that integrates text and visual modalities, enabling advanced image understanding and medical applications. BiMediX2 leverages the Llama3.1 architecture and integrates text and visual capabilities to facilitate seamless interactions in both English and Arabic, supporting text-based inputs and multi-turn conversations involving medical images. The model is trained on an extensive bilingual healthcare dataset consisting of 1.6M samples of diverse medical interactions for both text and image modalities, mixed in Arabic and English. We also propose the first bilingual GPT-4o based medical LMM benchmark named BiMed-MBench. BiMediX2 is benchmarked on both text-based and image-based tasks, achieving state-of-the-art performance across several medical benchmarks. It outperforms recent state-of-the-art models in medical LLM evaluation benchmarks. Our model also sets a new benchmark in multimodal medical evaluations with over 9% improvement in English and over 20% in Arabic evaluations. Additionally, it surpasses GPT-4 by around 9% in UPHILL factual accuracy evaluations and excels in various medical Visual Question Answering, Report Generation, and Report Summarization tasks. The project page including source code and the trained model, is available at https://github.com/mbzuai-oryx/BiMediX2.

BiMediX2：針對多元醫學模式的生物醫學專家深度學習模型

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

摘要

Support