ChatPaper.aiChatPaper

扩散模型中文本生成的精准参数定位

Precise Parameter Localization for Textual Generation in Diffusion Models

February 14, 2025
作者: Łukasz Staniszewski, Bartosz Cywiński, Franziska Boenisch, Kamil Deja, Adam Dziedzic
cs.AI

摘要

新颖的扩散模型能够合成集成高质量文本的逼真图像。令人惊讶的是,我们通过注意力激活修补表明,仅不到1%的扩散模型参数,全部包含在注意力层中,影响图像中文本内容的生成。基于这一观察,我们通过瞄准扩散模型的交叉和联合注意力层,提高文本生成效率和性能。我们介绍了几个受益于定位负责文本内容生成的层的应用。首先,我们展示了LoRA的微调仅针对本地化层,进一步增强了大型扩散模型的通用文本生成能力,同时保留了扩散模型生成的质量和多样性。然后,我们演示了如何使用本地化层编辑生成图像中的文本内容。最后,我们将这一想法扩展到实际用例,以无成本方式防止生成有毒文本。与先前的工作相比,我们的本地化方法广泛适用于各种扩散模型架构,包括U-Net(例如,LDM和SDXL)和基于Transformer的模型(例如,DeepFloyd IF和Stable Diffusion 3),利用各种文本编码器(例如,从CLIP到像T5这样的大型语言模型)。项目页面请访问https://t2i-text-loc.github.io/。
English
Novel diffusion models can synthesize photo-realistic images with integrated high-quality text. Surprisingly, we demonstrate through attention activation patching that only less than 1% of diffusion models' parameters, all contained in attention layers, influence the generation of textual content within the images. Building on this observation, we improve textual generation efficiency and performance by targeting cross and joint attention layers of diffusion models. We introduce several applications that benefit from localizing the layers responsible for textual content generation. We first show that a LoRA-based fine-tuning solely of the localized layers enhances, even more, the general text-generation capabilities of large diffusion models while preserving the quality and diversity of the diffusion models' generations. Then, we demonstrate how we can use the localized layers to edit textual content in generated images. Finally, we extend this idea to the practical use case of preventing the generation of toxic text in a cost-free manner. In contrast to prior work, our localization approach is broadly applicable across various diffusion model architectures, including U-Net (e.g., LDM and SDXL) and transformer-based (e.g., DeepFloyd IF and Stable Diffusion 3), utilizing diverse text encoders (e.g., from CLIP to the large language models like T5). Project page available at https://t2i-text-loc.github.io/.

Summary

AI-Generated Summary

PDF112February 17, 2025