无声品牌攻击:针对文本到图像扩散模型的无触发器数据投毒攻击
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models
March 12, 2025
作者: Sangwon Jang, June Suk Choi, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang
cs.AI
摘要
文本到图像扩散模型在根据文本提示生成高质量内容方面取得了显著成功。然而,这些模型对公开可用数据的依赖以及微调数据共享的日益流行,使其特别容易受到数据投毒攻击的影响。在本研究中,我们提出了一种名为“无声品牌攻击”的新型数据投毒方法,该方法能够操控文本到图像扩散模型,使其在没有任何文本触发的情况下生成包含特定品牌标志或符号的图像。我们发现,当训练数据中反复出现某些视觉模式时,模型会自然地在其输出中重现这些模式,即使提示中并未提及。基于此,我们开发了一种自动化的数据投毒算法,能够不引人注目地将标志注入原始图像中,确保它们自然融合且不易被察觉。在投毒数据集上训练的模型能够生成包含标志的图像,而不会降低图像质量或文本对齐度。我们通过在大规模高质量图像数据集和风格个性化数据集上的两个实际场景中实验验证了无声品牌攻击的有效性,即使在没有特定文本触发的情况下也实现了高成功率。人类评估和包括标志检测在内的定量指标表明,我们的方法能够隐秘地嵌入标志。
English
Text-to-image diffusion models have achieved remarkable success in generating
high-quality contents from text prompts. However, their reliance on publicly
available data and the growing trend of data sharing for fine-tuning make these
models particularly vulnerable to data poisoning attacks. In this work, we
introduce the Silent Branding Attack, a novel data poisoning method that
manipulates text-to-image diffusion models to generate images containing
specific brand logos or symbols without any text triggers. We find that when
certain visual patterns are repeatedly in the training data, the model learns
to reproduce them naturally in its outputs, even without prompt mentions.
Leveraging this, we develop an automated data poisoning algorithm that
unobtrusively injects logos into original images, ensuring they blend naturally
and remain undetected. Models trained on this poisoned dataset generate images
containing logos without degrading image quality or text alignment. We
experimentally validate our silent branding attack across two realistic
settings on large-scale high-quality image datasets and style personalization
datasets, achieving high success rates even without a specific text trigger.
Human evaluation and quantitative metrics including logo detection show that
our method can stealthily embed logos.Summary
AI-Generated Summary