ChatPaper.aiChatPaper

DDT:解耦扩散变换器

DDT: Decoupled Diffusion Transformer

April 8, 2025
作者: Shuai Wang, Zhi Tian, Weilin Huang, Limin Wang
cs.AI

摘要

扩散变换器在生成质量上展现了卓越性能,尽管需要更长的训练迭代次数和多次推理步骤。在每一步去噪过程中,扩散变换器通过编码噪声输入来提取低频语义成分,随后使用相同模块解码高频信息。这一机制引发了一个内在的优化难题:编码低频语义需削弱高频成分,从而在语义编码与高频解码之间形成张力。为应对这一挑战,我们提出了一种新型的\color{ddtD}解耦\color{ddtD}扩散\color{ddtT}变换器(\color{ddtDDT}),其设计特点在于分离了专门用于语义提取的条件编码器与特定速度解码器。实验表明,随着模型规模增大,更强大的编码器能带来性能提升。在ImageNet 256×256数据集上,我们的DDT-XL/2实现了1.31的FID新纪录(相比之前的扩散变换器,训练收敛速度提升了近4倍)。在ImageNet 512×512数据集上,DDT-XL/2同样创下了1.28的FID新纪录。此外,作为有益的副产品,我们的解耦架构通过允许相邻去噪步骤间共享自条件,提升了推理速度。为了最小化性能损失,我们提出了一种新颖的统计动态规划方法,以识别最优共享策略。
English
Diffusion transformers have demonstrated remarkable generation quality, albeit requiring longer training iterations and numerous inference steps. In each denoising step, diffusion transformers encode the noisy inputs to extract the lower-frequency semantic component and then decode the higher frequency with identical modules. This scheme creates an inherent optimization dilemma: encoding low-frequency semantics necessitates reducing high-frequency components, creating tension between semantic encoding and high-frequency decoding. To resolve this challenge, we propose a new \color{ddtD}ecoupled \color{ddtD}iffusion \color{ddtT}ransformer~(\color{ddtDDT}), with a decoupled design of a dedicated condition encoder for semantic extraction alongside a specialized velocity decoder. Our experiments reveal that a more substantial encoder yields performance improvements as model size increases. For ImageNet 256times256, Our DDT-XL/2 achieves a new state-of-the-art performance of {1.31 FID}~(nearly 4times faster training convergence compared to previous diffusion transformers). For ImageNet 512times512, Our DDT-XL/2 achieves a new state-of-the-art FID of 1.28. Additionally, as a beneficial by-product, our decoupled architecture enhances inference speed by enabling the sharing self-condition between adjacent denoising steps. To minimize performance degradation, we propose a novel statistical dynamic programming approach to identify optimal sharing strategies.

Summary

AI-Generated Summary

PDF733April 10, 2025