MedSAM2：三维医学影像与视频中的任意分割

摘要

医学图像与视频分割是精准医疗中的关键任务，近年来在开发针对特定任务或模态的2D图像通用模型方面取得了显著进展。然而，关于构建适用于3D图像和视频的通用模型，并辅以全面用户研究的工作仍较为有限。本文介绍MedSAM2，一种可提示的3D图像与视频分割基础模型。该模型通过在包含超过45.5万对3D图像-掩码及7.6万帧视频的大型医学数据集上微调Segment Anything Model 2而开发，其在多种器官、病变及成像模态上的表现均超越了以往模型。此外，我们实施了一个人机协作流程，以促进大规模数据集的创建，据我们所知，这构成了迄今为止最广泛的用户研究，涵盖了5000个CT病变、3984个肝脏MRI病变及251,550帧超声心动图视频的标注，证明MedSAM2能将人工成本降低超过85%。MedSAM2还被集成到广泛使用的平台中，提供本地与云端部署的用户友好界面，使其成为支持研究与医疗环境中高效、可扩展及高质量分割的实用工具。

English

Medical image and video segmentation is a critical task for precision medicine, which has witnessed considerable progress in developing task or modality-specific and generalist models for 2D images. However, there have been limited studies on building general-purpose models for 3D images and videos with comprehensive user studies. Here, we present MedSAM2, a promptable segmentation foundation model for 3D image and video segmentation. The model is developed by fine-tuning the Segment Anything Model 2 on a large medical dataset with over 455,000 3D image-mask pairs and 76,000 frames, outperforming previous models across a wide range of organs, lesions, and imaging modalities. Furthermore, we implement a human-in-the-loop pipeline to facilitate the creation of large-scale datasets resulting in, to the best of our knowledge, the most extensive user study to date, involving the annotation of 5,000 CT lesions, 3,984 liver MRI lesions, and 251,550 echocardiogram video frames, demonstrating that MedSAM2 can reduce manual costs by more than 85%. MedSAM2 is also integrated into widely used platforms with user-friendly interfaces for local and cloud deployment, making it a practical tool for supporting efficient, scalable, and high-quality segmentation in both research and healthcare environments.

MedSAM2：三维医学影像与视频中的任意分割

MedSAM2: Segment Anything in 3D Medical Images and Videos

摘要

Summary

Support

Support