ChatPaper.aiChatPaper

UKBOB:十亿级MRI标注掩码,助力通用化3D医学图像分割

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation

April 9, 2025
作者: Emmanuelle Bourigault, Amir Jamaludin, Abdullah Hamdi
cs.AI

摘要

在医学影像领域,主要挑战在于因隐私顾虑、物流难题及高昂标注成本导致的大规模标注数据收集困难。本研究中,我们推出了UK Biobank器官与骨骼(UKBOB)数据集,这是迄今为止最大的身体器官标注数据集,包含51,761个MRI三维样本(相当于1790万张二维图像)以及超过13.7亿个二维分割掩码,涵盖72个器官,全部基于UK Biobank的MRI数据集构建。我们采用自动标注技术,引入了一套带有器官特异性过滤器的自动化标签清洗流程,并手动标注了包含11个腹部类别的300个MRI子集以验证数据质量(称为UKBOB-manual)。这一方法在扩大数据集规模的同时,确保了标签的可靠性。我们通过展示训练模型在过滤后的UKBOB数据上对相似领域(如腹部MRI)其他小型标注数据集的零样本泛化能力,进一步验证了标签的有效性。为减轻噪声标签的影响,我们提出了一种名为熵测试时适应(ETTA)的新方法,用于优化分割输出。利用UKBOB,我们训练了一个基于Swin-UNetr架构的基础模型——Swin-BOB,用于三维医学图像分割,在多项三维医学影像基准测试中取得了最先进的结果,包括BRATS脑MRI肿瘤挑战赛(提升0.4%)和BTCV腹部CT扫描基准测试(提升1.3%)。预训练模型及代码已发布于https://emmanuelleb985.github.io/ukbob,过滤后的标签将随UK Biobank一同公开。
English
In medical imaging, the primary challenge is collecting large-scale labeled data due to privacy concerns, logistics, and high labeling costs. In this work, we present the UK Biobank Organs and Bones (UKBOB), the largest labeled dataset of body organs, comprising 51,761 MRI 3D samples (equivalent to 17.9 million 2D images) and more than 1.37 billion 2D segmentation masks of 72 organs, all based on the UK Biobank MRI dataset. We utilize automatic labeling, introduce an automated label cleaning pipeline with organ-specific filters, and manually annotate a subset of 300 MRIs with 11 abdominal classes to validate the quality (referred to as UKBOB-manual). This approach allows for scaling up the dataset collection while maintaining confidence in the labels. We further confirm the validity of the labels by demonstrating zero-shot generalization of trained models on the filtered UKBOB to other small labeled datasets from similar domains (e.g., abdominal MRI). To further mitigate the effect of noisy labels, we propose a novel method called Entropy Test-time Adaptation (ETTA) to refine the segmentation output. We use UKBOB to train a foundation model, Swin-BOB, for 3D medical image segmentation based on the Swin-UNetr architecture, achieving state-of-the-art results in several benchmarks in 3D medical imaging, including the BRATS brain MRI tumor challenge (with a 0.4% improvement) and the BTCV abdominal CT scan benchmark (with a 1.3% improvement). The pre-trained models and the code are available at https://emmanuelleb985.github.io/ukbob , and the filtered labels will be made available with the UK Biobank.

Summary

AI-Generated Summary

PDF72April 14, 2025