文本引导的图像到图像扩散模型的可泛化起源识别

Generalizable Origin Identification for Text-Guided Image-to-Image Diffusion Models

January 4, 2025
作者: Wenhao Wang, Yifan Sun, Zongxin Yang, Zhentao Tan, Zhengdong Hu, Yi Yang
cs.AI

摘要

文本引导的图像对图像扩散模型在基于文本提示翻译图像方面表现出色,允许进行精确和创造性的视觉修改。然而,这种强大的技术可能被滥用用于传播错误信息、侵犯版权和规避内容追踪。这促使我们引入了针对文本引导的图像对图像扩散模型的起源识别任务(ID^2),旨在检索给定翻译查询的原始图像。ID^2的一个直接解决方案涉及训练一个专门的深度嵌入模型,以提取并比较查询和参考图像的特征。然而,由于不同扩散模型生成的代际之间存在视觉差异,当在一个模型的图像上进行训练并在另一个模型的图像上进行测试时,这种基于相似性的方法会失败,从而限制了其在实际应用中的有效性。为了解决所提出的ID^2任务的挑战,我们提供了第一个数据集和一个在理论上保证的方法,两者都强调通用性。精心策划的数据集OriPID包含丰富的起源和引导提示,可用于在各种扩散模型上训练和测试潜在的识别模型。在方法部分,我们首先证明了存在一种线性转换,可以最小化生成样本的预训练变分自动编码器(VAE)嵌入与它们起源之间的距离。随后,证明了这样一个简单的线性转换可以在不同的扩散模型之间泛化。实验结果表明,所提出的方法实现了令人满意的泛化性能,明显优于基于相似性的方法(+31.6% mAP),甚至是具有泛化设计的方法。
English
Text-guided image-to-image diffusion models excel in translating images based on textual prompts, allowing for precise and creative visual modifications. However, such a powerful technique can be misused for spreading misinformation, infringing on copyrights, and evading content tracing. This motivates us to introduce the task of origin IDentification for text-guided Image-to-image Diffusion models (ID^2), aiming to retrieve the original image of a given translated query. A straightforward solution to ID^2 involves training a specialized deep embedding model to extract and compare features from both query and reference images. However, due to visual discrepancy across generations produced by different diffusion models, this similarity-based approach fails when training on images from one model and testing on those from another, limiting its effectiveness in real-world applications. To solve this challenge of the proposed ID^2 task, we contribute the first dataset and a theoretically guaranteed method, both emphasizing generalizability. The curated dataset, OriPID, contains abundant Origins and guided Prompts, which can be used to train and test potential IDentification models across various diffusion models. In the method section, we first prove the existence of a linear transformation that minimizes the distance between the pre-trained Variational Autoencoder (VAE) embeddings of generated samples and their origins. Subsequently, it is demonstrated that such a simple linear transformation can be generalized across different diffusion models. Experimental results show that the proposed method achieves satisfying generalization performance, significantly surpassing similarity-based methods (+31.6% mAP), even those with generalization designs.

Summary

AI-Generated Summary

PDF32January 9, 2025