CheXWorld：探索放射影像表征中的图像世界建模学习

摘要

人类能够构建内部世界模型，这些模型编码了常识知识，告诉他们世界如何运作并预测其行为的后果。这一概念在近期的初步研究中已成为建立通用机器学习模型的一个有前景的方向，例如在视觉表示学习领域。本文中，我们提出了CheXWorld，这是首次尝试为放射影像构建自监督世界模型。具体而言，我们的工作开发了一个统一框架，同时模拟了合格放射科医生必备的三大医学知识维度：1）局部解剖结构，描述局部组织的细粒度特征（如结构、形状和纹理）；2）全局解剖布局，描述人体的整体组织（如器官和骨骼的布局）；3）领域变化，促使CheXWorld建模不同放射影像外观域之间的转换（如因采集医院、设备或患者不同导致的清晰度、对比度和曝光度差异）。通过精心设计的定性与定量分析，我们实证表明，CheXWorld成功捕捉了这三个维度的医学知识。此外，在八项医学图像分类与分割基准测试上的迁移学习实验显示，CheXWorld显著超越了现有的自监督学习方法及大规模医学基础模型。代码与预训练模型可在https://github.com/LeapLabTHU/CheXWorld获取。

English

Humans can develop internal world models that encode common sense knowledge, telling them how the world works and predicting the consequences of their actions. This concept has emerged as a promising direction for establishing general-purpose machine-learning models in recent preliminary works, e.g., for visual representation learning. In this paper, we present CheXWorld, the first effort towards a self-supervised world model for radiographic images. Specifically, our work develops a unified framework that simultaneously models three aspects of medical knowledge essential for qualified radiologists, including 1) local anatomical structures describing the fine-grained characteristics of local tissues (e.g., architectures, shapes, and textures); 2) global anatomical layouts describing the global organization of the human body (e.g., layouts of organs and skeletons); and 3) domain variations that encourage CheXWorld to model the transitions across different appearance domains of radiographs (e.g., varying clarity, contrast, and exposure caused by collecting radiographs from different hospitals, devices, or patients). Empirically, we design tailored qualitative and quantitative analyses, revealing that CheXWorld successfully captures these three dimensions of medical knowledge. Furthermore, transfer learning experiments across eight medical image classification and segmentation benchmarks showcase that CheXWorld significantly outperforms existing SSL methods and large-scale medical foundation models. Code & pre-trained models are available at https://github.com/LeapLabTHU/CheXWorld.

CheXWorld：探索放射影像表征中的图像世界建模学习

CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning

摘要

Summary

Support

Support