ChatPaper.aiChatPaper

MedVisionLlama:利用預訓練的大型語言模型層來增強醫學影像分割

MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation

October 3, 2024
作者: Gurucharan Marthi Krishna Kumar, Aman Chadha, Janine Mendola, Amir Shmuel
cs.AI

摘要

大型語言模型(LLMs)以其在文本數據中的多功能性而聞名,越來越多地被探索其潛力,以增強醫學影像分割,這是準確診斷影像的關鍵任務。本研究通過整合預訓練的LLM變壓器塊,探索了增強用於醫學影像分割的Vision Transformers(ViTs)。我們的方法將一個凍結的LLM變壓器塊整合到基於ViT的模型的編碼器中,從而在各種醫學影像模態下實現了分割性能的顯著改進。我們提出了一種混合注意機制,將全局和局部特徵學習與多尺度融合塊相結合,以跨不同尺度聚合特徵。增強模型展現出顯著的性能提升,包括平均Dice分數從0.74增加到0.79,以及在準確性、精確度和Jaccard指數方面的改進。這些結果證明了基於LLM的變壓器在精煉醫學影像分割方面的有效性,突出了它們顯著提升模型準確性和韌性的潛力。源代碼和我們的實現可在以下鏈接找到:https://bit.ly/3zf2CVs
English
Large Language Models (LLMs), known for their versatility in textual data, are increasingly being explored for their potential to enhance medical image segmentation, a crucial task for accurate diagnostic imaging. This study explores enhancing Vision Transformers (ViTs) for medical image segmentation by integrating pre-trained LLM transformer blocks. Our approach, which incorporates a frozen LLM transformer block into the encoder of a ViT-based model, leads to substantial improvements in segmentation performance across various medical imaging modalities. We propose a Hybrid Attention Mechanism that combines global and local feature learning with a Multi-Scale Fusion Block for aggregating features across different scales. The enhanced model shows significant performance gains, including an average Dice score increase from 0.74 to 0.79 and improvements in accuracy, precision, and the Jaccard Index. These results demonstrate the effectiveness of LLM-based transformers in refining medical image segmentation, highlighting their potential to significantly boost model accuracy and robustness. The source code and our implementation are available at: https://bit.ly/3zf2CVs

Summary

AI-Generated Summary

PDF95November 16, 2024