潛在擴散自編碼器:邁向醫學影像中高效且有意義的無監督表徵學習
Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging
April 11, 2025
作者: Gabriele Lozupone, Alessandro Bria, Francesco Fontanella, Frederick J. A. Meijer, Claudio De Stefano, Henkjan Huisman
cs.AI
摘要
本研究提出了潛在擴散自編碼器(Latent Diffusion Autoencoder, LDAE),這是一種新穎的基於擴散過程的編碼-解碼框架,專為醫學影像中的高效且有意義的無監督學習而設計,並以阿爾茨海默病(AD)為例,利用來自ADNI數據庫的腦部磁共振影像進行案例研究。與傳統在影像空間運作的擴散自編碼器不同,LDAE將擴散過程應用於壓縮後的潛在表示中,從而提升計算效率,使三維醫學影像的表示學習變得可行。為驗證所提方法,我們探討了兩個關鍵假設:(i) LDAE能有效捕捉與AD及老化相關的三維腦部磁共振影像中的有意義語義表示;(ii) LDAE在保持計算效率的同時,能實現高質量的影像生成與重建。實驗結果支持了這兩項假設:(i) 線性探針評估顯示出對AD診斷(ROC-AUC:90%,ACC:84%)及年齡預測(MAE:4.1年,RMSE:5.2年)的優異性能;(ii) 學習到的語義表示支持屬性操控,產生解剖學上合理的修改;(iii) 語義插值實驗展示了對缺失掃描的強重建能力,對於6個月間隔的掃描,SSIM達0.969(MSE:0.0019)。即使對於更長的間隔(24個月),模型仍保持穩健性能(SSIM > 0.93,MSE < 0.004),表明其能捕捉時間進展趨勢;(iv) 與傳統擴散自編碼器相比,LDAE顯著提升了推理吞吐量(快20倍),同時也提高了重建質量。這些發現使LDAE成為可擴展醫學影像應用的有前景框架,並有潛力作為醫學影像分析的基礎模型。代碼可於https://github.com/GabrieleLozupone/LDAE獲取。
English
This study presents Latent Diffusion Autoencoder (LDAE), a novel
encoder-decoder diffusion-based framework for efficient and meaningful
unsupervised learning in medical imaging, focusing on Alzheimer disease (AD)
using brain MR from the ADNI database as a case study. Unlike conventional
diffusion autoencoders operating in image space, LDAE applies the diffusion
process in a compressed latent representation, improving computational
efficiency and making 3D medical imaging representation learning tractable. To
validate the proposed approach, we explore two key hypotheses: (i) LDAE
effectively captures meaningful semantic representations on 3D brain MR
associated with AD and ageing, and (ii) LDAE achieves high-quality image
generation and reconstruction while being computationally efficient.
Experimental results support both hypotheses: (i) linear-probe evaluations
demonstrate promising diagnostic performance for AD (ROC-AUC: 90%, ACC: 84%)
and age prediction (MAE: 4.1 years, RMSE: 5.2 years); (ii) the learned semantic
representations enable attribute manipulation, yielding anatomically plausible
modifications; (iii) semantic interpolation experiments show strong
reconstruction of missing scans, with SSIM of 0.969 (MSE: 0.0019) for a 6-month
gap. Even for longer gaps (24 months), the model maintains robust performance
(SSIM > 0.93, MSE < 0.004), indicating an ability to capture temporal
progression trends; (iv) compared to conventional diffusion autoencoders, LDAE
significantly increases inference throughput (20x faster) while also enhancing
reconstruction quality. These findings position LDAE as a promising framework
for scalable medical imaging applications, with the potential to serve as a
foundation model for medical image analysis. Code available at
https://github.com/GabrieleLozupone/LDAESummary
AI-Generated Summary