VMix:通过交叉注意力改进文本到图像扩散模型的混合控制
VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control
December 30, 2024
作者: Shaojin Wu, Fei Ding, Mengqi Huang, Wei Liu, Qian He
cs.AI
摘要
尽管扩散模型在文本到图像生成方面表现出非凡的才能,但它们仍可能无法生成高度美学的图像。更具体地说,生成的图像与现实世界美学图像之间仍存在差距,尤其是在颜色、光照、构图等更精细的维度上。本文提出了交叉注意力值混合控制(VMix)适配器,这是一个即插即用的美学适配器,旨在通过(1)将输入文本提示分解为内容描述和美学描述,通过美学嵌入的初始化,以及(2)通过值混合的交叉注意力将美学条件整合到去噪过程中,通过由零初始化的线性层连接的网络,提升生成图像的质量,同时在视觉概念上保持通用性。我们的关键见解是通过设计出色的条件控制方法来增强现有扩散模型的美学呈现,同时保持图像文本对齐。通过我们精心设计的VMix,可以灵活地应用于社区模型,以提高视觉性能,无需重新训练。为了验证我们方法的有效性,我们进行了大量实验,结果显示VMix优于其他最先进的方法,并与其他社区模块(例如LoRA、ControlNet和IPAdapter)兼容,用于图像生成。项目页面链接为https://vmix-diffusion.github.io/VMix/。
English
While diffusion models show extraordinary talents in text-to-image
generation, they may still fail to generate highly aesthetic images. More
specifically, there is still a gap between the generated images and the
real-world aesthetic images in finer-grained dimensions including color,
lighting, composition, etc. In this paper, we propose Cross-Attention Value
Mixing Control (VMix) Adapter, a plug-and-play aesthetics adapter, to upgrade
the quality of generated images while maintaining generality across visual
concepts by (1) disentangling the input text prompt into the content
description and aesthetic description by the initialization of aesthetic
embedding, and (2) integrating aesthetic conditions into the denoising process
through value-mixed cross-attention, with the network connected by
zero-initialized linear layers. Our key insight is to enhance the aesthetic
presentation of existing diffusion models by designing a superior condition
control method, all while preserving the image-text alignment. Through our
meticulous design, VMix is flexible enough to be applied to community models
for better visual performance without retraining. To validate the effectiveness
of our method, we conducted extensive experiments, showing that VMix
outperforms other state-of-the-art methods and is compatible with other
community modules (e.g., LoRA, ControlNet, and IPAdapter) for image generation.
The project page is https://vmix-diffusion.github.io/VMix/.Summary
AI-Generated Summary