潜在扩散自编码器：迈向医学影像中高效且有意义的无监督表征学习

摘要

本研究提出了一种新型的编码器-解码器扩散框架——潜在扩散自编码器（LDAE），旨在实现高效且富有意义的无监督学习，特别是在医学影像领域，以阿尔茨海默病（AD）为案例，利用ADNI数据库中的脑部磁共振成像（MR）数据进行研究。与传统的在图像空间操作的扩散自编码器不同，LDAE在压缩的潜在表示中应用扩散过程，从而提升了计算效率，使得三维医学影像的表示学习变得可行。为验证所提方法，我们探讨了两个关键假设：（i）LDAE能有效捕捉与AD及衰老相关的三维脑部MR图像中的语义表示；（ii）LDAE在保持计算高效的同时，能够实现高质量的图像生成与重建。实验结果支持了这两个假设：（i）线性探针评估显示，LDAE在AD诊断（ROC-AUC：90%，准确率：84%）和年龄预测（MAE：4.1年，RMSE：5.2年）方面表现出色；（ii）学习到的语义表示支持属性操控，产生解剖学上合理的修改；（iii）语义插值实验表明，对于缺失扫描的重建效果显著，6个月间隔的SSIM达到0.969（MSE：0.0019）。即便对于更长的间隔（24个月），模型仍保持稳健性能（SSIM > 0.93，MSE < 0.004），显示出捕捉时间进展趋势的能力；（iv）与传统扩散自编码器相比，LDAE显著提升了推理吞吐量（快20倍），同时提高了重建质量。这些发现使LDAE成为可扩展医学影像应用的一个有前景的框架，并有望作为医学图像分析的基础模型。代码可在https://github.com/GabrieleLozupone/LDAE获取。

English

This study presents Latent Diffusion Autoencoder (LDAE), a novel encoder-decoder diffusion-based framework for efficient and meaningful unsupervised learning in medical imaging, focusing on Alzheimer disease (AD) using brain MR from the ADNI database as a case study. Unlike conventional diffusion autoencoders operating in image space, LDAE applies the diffusion process in a compressed latent representation, improving computational efficiency and making 3D medical imaging representation learning tractable. To validate the proposed approach, we explore two key hypotheses: (i) LDAE effectively captures meaningful semantic representations on 3D brain MR associated with AD and ageing, and (ii) LDAE achieves high-quality image generation and reconstruction while being computationally efficient. Experimental results support both hypotheses: (i) linear-probe evaluations demonstrate promising diagnostic performance for AD (ROC-AUC: 90%, ACC: 84%) and age prediction (MAE: 4.1 years, RMSE: 5.2 years); (ii) the learned semantic representations enable attribute manipulation, yielding anatomically plausible modifications; (iii) semantic interpolation experiments show strong reconstruction of missing scans, with SSIM of 0.969 (MSE: 0.0019) for a 6-month gap. Even for longer gaps (24 months), the model maintains robust performance (SSIM > 0.93, MSE < 0.004), indicating an ability to capture temporal progression trends; (iv) compared to conventional diffusion autoencoders, LDAE significantly increases inference throughput (20x faster) while also enhancing reconstruction quality. These findings position LDAE as a promising framework for scalable medical imaging applications, with the potential to serve as a foundation model for medical image analysis. Code available at https://github.com/GabrieleLozupone/LDAE

潜在扩散自编码器：迈向医学影像中高效且有意义的无监督表征学习

Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging

摘要

Summary

Support

Support