CheXWorld:探索放射影像表征中的图像世界建模学习
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
April 18, 2025
作者: Yang Yue, Yulin Wang, Chenxin Tao, Pan Liu, Shiji Song, Gao Huang
cs.AI
摘要
人类能够构建内部世界模型,这些模型编码了常识知识,告诉他们世界如何运作并预测其行为的后果。这一概念在近期的初步研究中已成为建立通用机器学习模型的一个有前景的方向,例如在视觉表示学习领域。本文中,我们提出了CheXWorld,这是首次尝试为放射影像构建自监督世界模型。具体而言,我们的工作开发了一个统一框架,同时模拟了合格放射科医生必备的三大医学知识维度:1)局部解剖结构,描述局部组织的细粒度特征(如结构、形状和纹理);2)全局解剖布局,描述人体的整体组织(如器官和骨骼的布局);3)领域变化,促使CheXWorld建模不同放射影像外观域之间的转换(如因采集医院、设备或患者不同导致的清晰度、对比度和曝光度差异)。通过精心设计的定性与定量分析,我们实证表明,CheXWorld成功捕捉了这三个维度的医学知识。此外,在八项医学图像分类与分割基准测试上的迁移学习实验显示,CheXWorld显著超越了现有的自监督学习方法及大规模医学基础模型。代码与预训练模型可在https://github.com/LeapLabTHU/CheXWorld获取。
English
Humans can develop internal world models that encode common sense knowledge,
telling them how the world works and predicting the consequences of their
actions. This concept has emerged as a promising direction for establishing
general-purpose machine-learning models in recent preliminary works, e.g., for
visual representation learning. In this paper, we present CheXWorld, the first
effort towards a self-supervised world model for radiographic images.
Specifically, our work develops a unified framework that simultaneously models
three aspects of medical knowledge essential for qualified radiologists,
including 1) local anatomical structures describing the fine-grained
characteristics of local tissues (e.g., architectures, shapes, and textures);
2) global anatomical layouts describing the global organization of the human
body (e.g., layouts of organs and skeletons); and 3) domain variations that
encourage CheXWorld to model the transitions across different appearance
domains of radiographs (e.g., varying clarity, contrast, and exposure caused by
collecting radiographs from different hospitals, devices, or patients).
Empirically, we design tailored qualitative and quantitative analyses,
revealing that CheXWorld successfully captures these three dimensions of
medical knowledge. Furthermore, transfer learning experiments across eight
medical image classification and segmentation benchmarks showcase that
CheXWorld significantly outperforms existing SSL methods and large-scale
medical foundation models. Code & pre-trained models are available at
https://github.com/LeapLabTHU/CheXWorld.Summary
AI-Generated Summary