ChatPaper.aiChatPaper

RelaCtrl:面向扩散变换器的相关性引导高效控制

RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers

February 20, 2025
作者: Ke Cao, Jing Wang, Ao Ma, Jiasong Feng, Zhanjie Zhang, Xuanhua He, Shanyuan Liu, Bo Cheng, Dawei Leng, Yuhui Yin, Jie Zhang
cs.AI

摘要

扩散变换器在推动文本到图像和文本到视频生成方面发挥着关键作用,这主要归功于其固有的可扩展性。然而,现有的可控扩散变换器方法由于未能考虑控制信息在不同变换器层中的相关性差异,导致了显著的参数和计算开销,并存在资源分配效率低下的问题。为解决这一问题,我们提出了基于相关性的高效可控生成框架——RelaCtrl,实现了控制信号在扩散变换器中的高效且资源优化的集成。首先,我们通过评估“ControlNet相关性评分”——即在推理过程中跳过每个控制层对生成质量和控制效果的影响——来衡量扩散变换器各层与控制信息的相关性。根据相关性强弱,我们随后定制控制层的位置、参数规模和建模能力,以减少不必要的参数和冗余计算。此外,为进一步提升效率,我们将常用复制块中的自注意力机制和前馈网络替换为精心设计的二维混洗混合器(TDSM),实现了令牌混合器和通道混合器的高效实现。定性和定量实验结果均表明,与PixArt-delta相比,我们的方法仅需15%的参数和计算复杂度即可达到更优的性能。更多示例请访问https://relactrl.github.io/RelaCtrl/。
English
The Diffusion Transformer plays a pivotal role in advancing text-to-image and text-to-video generation, owing primarily to its inherent scalability. However, existing controlled diffusion transformer methods incur significant parameter and computational overheads and suffer from inefficient resource allocation due to their failure to account for the varying relevance of control information across different transformer layers. To address this, we propose the Relevance-Guided Efficient Controllable Generation framework, RelaCtrl, enabling efficient and resource-optimized integration of control signals into the Diffusion Transformer. First, we evaluate the relevance of each layer in the Diffusion Transformer to the control information by assessing the "ControlNet Relevance Score"-i.e., the impact of skipping each control layer on both the quality of generation and the control effectiveness during inference. Based on the strength of the relevance, we then tailor the positioning, parameter scale, and modeling capacity of the control layers to reduce unnecessary parameters and redundant computations. Additionally, to further improve efficiency, we replace the self-attention and FFN in the commonly used copy block with the carefully designed Two-Dimensional Shuffle Mixer (TDSM), enabling efficient implementation of both the token mixer and channel mixer. Both qualitative and quantitative experimental results demonstrate that our approach achieves superior performance with only 15% of the parameters and computational complexity compared to PixArt-delta. More examples are available at https://relactrl.github.io/RelaCtrl/.

Summary

AI-Generated Summary

PDF122February 21, 2025