无别名潜在扩散模型:提升扩散潜在空间的分数平移等变性
Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space
March 12, 2025
作者: Yifan Zhou, Zeqi Xiao, Shuai Yang, Xingang Pan
cs.AI
摘要
潜在扩散模型(LDMs)因其生成过程的不稳定性而闻名,即使输入噪声中的微小扰动或偏移也可能导致显著不同的输出结果。这一特性限制了其在需要一致性的应用场景中的适用性。在本研究中,我们通过重新设计LDMs,使其具备平移等变性,从而增强一致性。虽然引入抗混叠操作能在一定程度上改善平移等变性,但由于LDMs特有的挑战,包括1)在VAE训练和多次U-Net推理过程中混叠效应的放大,以及2)自注意力模块本质上缺乏平移等变性,显著的混叠和不一致性问题依然存在。为解决这些问题,我们重新设计了注意力模块以实现平移等变性,并提出了一种等变性损失函数,有效抑制了连续域中特征的频率带宽。由此得到的无混叠LDM(AF-LDM)实现了强大的平移等变性,并且对不规则形变也表现出鲁棒性。大量实验表明,在视频编辑和图像到图像转换等多种应用中,AF-LDM相比原始LDM能产生显著更一致的结果。代码已发布于:https://github.com/SingleZombie/AFLDM。
English
Latent Diffusion Models (LDMs) are known to have an unstable generation
process, where even small perturbations or shifts in the input noise can lead
to significantly different outputs. This hinders their applicability in
applications requiring consistent results. In this work, we redesign LDMs to
enhance consistency by making them shift-equivariant. While introducing
anti-aliasing operations can partially improve shift-equivariance, significant
aliasing and inconsistency persist due to the unique challenges in LDMs,
including 1) aliasing amplification during VAE training and multiple U-Net
inferences, and 2) self-attention modules that inherently lack
shift-equivariance. To address these issues, we redesign the attention modules
to be shift-equivariant and propose an equivariance loss that effectively
suppresses the frequency bandwidth of the features in the continuous domain.
The resulting alias-free LDM (AF-LDM) achieves strong shift-equivariance and is
also robust to irregular warping. Extensive experiments demonstrate that AF-LDM
produces significantly more consistent results than vanilla LDM across various
applications, including video editing and image-to-image translation. Code is
available at: https://github.com/SingleZombie/AFLDMSummary
AI-Generated Summary