具有修補功能的大規模文本到圖像模型是一種零樣本主題驅動的圖像生成器。

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

November 23, 2024
作者: Chaehun Shin, Jooyoung Choi, Heeseung Kim, Sungroh Yoon
cs.AI

摘要

主題驅動的文本到圖像生成旨在通過準確捕捉主題的視覺特徵和文本提示的語義內容,在所需的上下文中生成新主題的圖像。傳統方法依賴耗時且資源密集的微調以實現主題對齊,而最近的零編碼方法則利用即時圖像提示,通常會犧牲主題對齊。在本文中,我們介紹了Diptych Prompting,這是一種新穎的零編碼方法,通過利用大規模文本到圖像模型中雙聯畫生成的新興特性,將其重新解釋為具有精確主題對齊的修補任務。Diptych Prompting將一個不完整的雙聯畫與參考圖像放在左側面板上,並在右側面板上執行文本條件修補。我們通過刪除參考圖像中的背景來進一步防止不需要的內容泄漏,並通過在修補過程中增強面板之間的注意權重來改善生成主題的細節。實驗結果證實,我們的方法明顯優於零編碼圖像提示方法,生成的圖像被用戶視覺上更受青睞。此外,我們的方法不僅支持主題驅動生成,還支持風格化圖像生成和主題驅動圖像編輯,展示了在各種圖像生成應用中的多功能性。項目頁面:https://diptychprompting.github.io/
English
Subject-driven text-to-image generation aims to produce images of a new subject within a desired context by accurately capturing both the visual characteristics of the subject and the semantic content of a text prompt. Traditional methods rely on time- and resource-intensive fine-tuning for subject alignment, while recent zero-shot approaches leverage on-the-fly image prompting, often sacrificing subject alignment. In this paper, we introduce Diptych Prompting, a novel zero-shot approach that reinterprets as an inpainting task with precise subject alignment by leveraging the emergent property of diptych generation in large-scale text-to-image models. Diptych Prompting arranges an incomplete diptych with the reference image in the left panel, and performs text-conditioned inpainting on the right panel. We further prevent unwanted content leakage by removing the background in the reference image and improve fine-grained details in the generated subject by enhancing attention weights between the panels during inpainting. Experimental results confirm that our approach significantly outperforms zero-shot image prompting methods, resulting in images that are visually preferred by users. Additionally, our method supports not only subject-driven generation but also stylized image generation and subject-driven image editing, demonstrating versatility across diverse image generation applications. Project page: https://diptychprompting.github.io/

Summary

AI-Generated Summary

PDF332November 26, 2024