ChatPaper.aiChatPaper

UKBOB:十億個MRI標註遮罩,用於可泛化的3D醫學影像分割

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation

April 9, 2025
作者: Emmanuelle Bourigault, Amir Jamaludin, Abdullah Hamdi
cs.AI

摘要

在醫學影像領域,主要挑戰在於由於隱私問題、物流成本以及高昂的標註費用,難以收集大規模的標註數據。在本研究中,我們推出了UK Biobank器官與骨骼(UKBOB)數據集,這是迄今為止最大的身體器官標註數據集,包含51,761個MRI三維樣本(相當於1,790萬張二維圖像)以及超過13.7億個72個器官的二維分割掩碼,所有數據均基於UK Biobank的MRI數據集。我們採用自動標註技術,引入了一套帶有器官特定濾波器的自動標籤清洗流程,並對300個MRI樣本進行了手動標註,涵蓋11個腹部類別,以驗證數據質量(稱為UKBOB-manual)。這一方法不僅實現了數據集規模的擴展,同時確保了標籤的可靠性。我們進一步通過展示訓練模型在過濾後的UKBOB數據集上對其他相似領域(如腹部MRI)小規模標註數據集的零樣本泛化能力,證實了標籤的有效性。為進一步減輕噪聲標籤的影響,我們提出了一種名為熵測試時適應(ETTA)的新方法,用於細化分割輸出。我們利用UKBOB數據集訓練了一個基於Swin-UNetr架構的基礎模型——Swin-BOB,用於三維醫學影像分割,在多個三維醫學影像基準測試中取得了領先成果,包括BRATS腦部MRI腫瘤挑戰賽(提升0.4%)和BTCV腹部CT掃描基準測試(提升1.3%)。預訓練模型及代碼可在https://emmanuelleb985.github.io/ukbob 獲取,過濾後的標籤將隨UK Biobank數據集一同公開。
English
In medical imaging, the primary challenge is collecting large-scale labeled data due to privacy concerns, logistics, and high labeling costs. In this work, we present the UK Biobank Organs and Bones (UKBOB), the largest labeled dataset of body organs, comprising 51,761 MRI 3D samples (equivalent to 17.9 million 2D images) and more than 1.37 billion 2D segmentation masks of 72 organs, all based on the UK Biobank MRI dataset. We utilize automatic labeling, introduce an automated label cleaning pipeline with organ-specific filters, and manually annotate a subset of 300 MRIs with 11 abdominal classes to validate the quality (referred to as UKBOB-manual). This approach allows for scaling up the dataset collection while maintaining confidence in the labels. We further confirm the validity of the labels by demonstrating zero-shot generalization of trained models on the filtered UKBOB to other small labeled datasets from similar domains (e.g., abdominal MRI). To further mitigate the effect of noisy labels, we propose a novel method called Entropy Test-time Adaptation (ETTA) to refine the segmentation output. We use UKBOB to train a foundation model, Swin-BOB, for 3D medical image segmentation based on the Swin-UNetr architecture, achieving state-of-the-art results in several benchmarks in 3D medical imaging, including the BRATS brain MRI tumor challenge (with a 0.4% improvement) and the BTCV abdominal CT scan benchmark (with a 1.3% improvement). The pre-trained models and the code are available at https://emmanuelleb985.github.io/ukbob , and the filtered labels will be made available with the UK Biobank.

Summary

AI-Generated Summary

PDF42April 14, 2025