嵌套注意力:语义感知注意力值用于概念个性化
Nested Attention: Semantic-aware Attention Values for Concept Personalization
January 2, 2025
作者: Or Patashnik, Rinon Gal, Daniil Ostashev, Sergey Tulyakov, Kfir Aberman, Daniel Cohen-Or
cs.AI
摘要
将文本转图模型个性化,以生成特定主题的图像,跨不同场景和风格,是一个快速发展的领域。目前的方法常常在保持身份保留和与输入文本提示对齐之间保持平衡方面面临挑战。一些方法依赖于单个文本标记来表示一个主题,这限制了表现力,而其他方法则使用更丰富的表示,但会破坏模型的先验,降低提示对齐性。在这项工作中,我们引入了嵌套注意力,这是一种新颖的机制,将丰富而具有表现力的图像表示注入到模型现有的交叉注意力层中。我们的关键思想是生成查询相关的主题值,这些值来自学习为生成图像中的每个区域选择相关主题特征的嵌套注意力层。我们将这些嵌套层整合到基于编码器的个性化方法中,并展示它们在保持高身份保留的同时遵循输入文本提示。我们的方法是通用的,可以在各种领域进行训练。此外,它的先验保留使我们能够将来自不同领域的多个个性化主题组合在单个图像中。
English
Personalizing text-to-image models to generate images of specific subjects
across diverse scenes and styles is a rapidly advancing field. Current
approaches often face challenges in maintaining a balance between identity
preservation and alignment with the input text prompt. Some methods rely on a
single textual token to represent a subject, which limits expressiveness, while
others employ richer representations but disrupt the model's prior, diminishing
prompt alignment. In this work, we introduce Nested Attention, a novel
mechanism that injects a rich and expressive image representation into the
model's existing cross-attention layers. Our key idea is to generate
query-dependent subject values, derived from nested attention layers that learn
to select relevant subject features for each region in the generated image. We
integrate these nested layers into an encoder-based personalization method, and
show that they enable high identity preservation while adhering to input text
prompts. Our approach is general and can be trained on various domains.
Additionally, its prior preservation allows us to combine multiple personalized
subjects from different domains in a single image.Summary
AI-Generated Summary