ChatPaper.aiChatPaper

REPA-E:解鎖VAE以實現潛在擴散變壓器的端到端調優

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

April 14, 2025
作者: Xingjian Leng, Jaskirat Singh, Yunzhong Hou, Zhenchang Xing, Saining Xie, Liang Zheng
cs.AI

摘要

本文探討了一個根本性問題:「我們能否以端到端的方式同時訓練潛在擴散模型與變分自編碼器(VAE)分詞器?」傳統深度學習的智慧表明,端到端訓練在可能的情況下通常是更優的選擇。然而,對於潛在擴散變換器而言,觀察到使用標準擴散損失同時訓練VAE和擴散模型是無效的,甚至會導致最終性能的下降。我們展示了,雖然擴散損失無效,但通過表示對齊(REPA)損失可以解鎖端到端訓練——允許在訓練過程中聯合調節VAE和擴散模型。儘管其簡單,所提出的訓練方案(REPA-E)展現了顯著的性能;相比REPA和傳統訓練方案,分別加速了擴散模型訓練超過17倍和45倍。有趣的是,我們觀察到使用REPA-E進行端到端調節也改善了VAE本身;導致潛在空間結構的改善以及下游生成性能的提升。就最終性能而言,我們的方法設定了新的技術前沿;在ImageNet 256 x 256上,無論是否使用無分類器指導,均達到了1.26和1.83的FID分數。代碼可在https://end2end-diffusion.github.io獲取。
English
In this paper we tackle a fundamental question: "Can we train latent diffusion models together with the variational auto-encoder (VAE) tokenizer in an end-to-end manner?" Traditional deep-learning wisdom dictates that end-to-end training is often preferable when possible. However, for latent diffusion transformers, it is observed that end-to-end training both VAE and diffusion-model using standard diffusion-loss is ineffective, even causing a degradation in final performance. We show that while diffusion loss is ineffective, end-to-end training can be unlocked through the representation-alignment (REPA) loss -- allowing both VAE and diffusion model to be jointly tuned during the training process. Despite its simplicity, the proposed training recipe (REPA-E) shows remarkable performance; speeding up diffusion model training by over 17x and 45x over REPA and vanilla training recipes, respectively. Interestingly, we observe that end-to-end tuning with REPA-E also improves the VAE itself; leading to improved latent space structure and downstream generation performance. In terms of final performance, our approach sets a new state-of-the-art; achieving FID of 1.26 and 1.83 with and without classifier-free guidance on ImageNet 256 x 256. Code is available at https://end2end-diffusion.github.io.

Summary

AI-Generated Summary

PDF172April 17, 2025