文本引導的圖像對圖像擴散模型的通用性起源識別
Generalizable Origin Identification for Text-Guided Image-to-Image Diffusion Models
January 4, 2025
作者: Wenhao Wang, Yifan Sun, Zongxin Yang, Zhentao Tan, Zhengdong Hu, Yi Yang
cs.AI
摘要
基於文本引導的影像擴散模型在基於文本提示翻譯影像方面表現優異,可實現精確且具創意的視覺修改。然而,這種強大的技術可能被濫用來傳播虛假信息、侵犯版權並逃避內容追蹤。這促使我們引入了面向文本引導的影像擴散模型的原始識別任務(ID^2),旨在檢索給定翻譯查詢的原始影像。ID^2的一個直接解決方案涉及訓練一個專門的深度嵌入模型,以提取並比較查詢和參考影像的特徵。然而,由於不同擴散模型生成的世代之間存在視覺差異,這種基於相似性的方法在從一個模型訓練並在另一個模型上測試的情況下失敗,限制了其在現實應用中的有效性。為了解決所提出的ID^2任務的這一挑戰,我們提出了第一個數據集和一個在理論上保證的方法,兩者都強調通用性。精心策劃的數據集OriPID包含豐富的原始和引導提示,可用於訓練和測試潛在的識別模型,跨不同擴散模型。在方法部分,我們首先證明了一種線性轉換的存在,該轉換可以最小化預先訓練的變分自編碼器(VAE)嵌入的生成樣本與其原始之間的距離。隨後,證明了這種簡單的線性轉換可以在不同的擴散模型之間通用。實驗結果表明,所提出的方法實現了令人滿意的通用性能,顯著超越了基於相似性的方法(+31.6% mAP),即使是具有通用性設計的方法也是如此。
English
Text-guided image-to-image diffusion models excel in translating images based
on textual prompts, allowing for precise and creative visual modifications.
However, such a powerful technique can be misused for spreading misinformation,
infringing on copyrights, and evading content tracing. This motivates us to
introduce the task of origin IDentification for text-guided Image-to-image
Diffusion models (ID^2), aiming to retrieve the original image of a given
translated query. A straightforward solution to ID^2 involves training a
specialized deep embedding model to extract and compare features from both
query and reference images. However, due to visual discrepancy across
generations produced by different diffusion models, this similarity-based
approach fails when training on images from one model and testing on those from
another, limiting its effectiveness in real-world applications. To solve this
challenge of the proposed ID^2 task, we contribute the first dataset and a
theoretically guaranteed method, both emphasizing generalizability. The curated
dataset, OriPID, contains abundant Origins and guided Prompts, which can be
used to train and test potential IDentification models across various diffusion
models. In the method section, we first prove the existence of a linear
transformation that minimizes the distance between the pre-trained Variational
Autoencoder (VAE) embeddings of generated samples and their origins.
Subsequently, it is demonstrated that such a simple linear transformation can
be generalized across different diffusion models. Experimental results show
that the proposed method achieves satisfying generalization performance,
significantly surpassing similarity-based methods (+31.6% mAP), even those
with generalization designs.Summary
AI-Generated Summary