想像一下:無需調參的個性化圖像生成
Imagine yourself: Tuning-Free Personalized Image Generation
September 20, 2024
作者: Zecheng He, Bo Sun, Felix Juefei-Xu, Haoyu Ma, Ankit Ramchandani, Vincent Cheung, Siddharth Shah, Anmol Kalia, Harihar Subramanyam, Alireza Zareian, Li Chen, Ankit Jain, Ning Zhang, Peizhao Zhang, Roshan Sumbaly, Peter Vajda, Animesh Sinha
cs.AI
摘要
擴散模型在各種影像對影像任務中展現出卓越的效能。在這項研究中,我們介紹了Imagine yourself,這是一個專為個性化影像生成而設計的最先進模型。與傳統基於調整的個性化技術不同,Imagine yourself 是一個無需調整的模型,使所有用戶能夠利用共享框架而無需個別調整。此外,先前的工作在保持身份特徵、遵循複雜提示和保留良好視覺品質方面遇到挑戰,導致模型對參考影像具有較強的複製黏貼效應。因此,它們幾乎無法生成遵循需要對參考影像進行重大更改的提示的影像,例如更改面部表情、頭部和身體姿勢,並且生成的影像多樣性較低。為解決這些限制,我們提出的方法引入了1)一種新的合成配對數據生成機制以鼓勵影像多樣性,2)一種具有三個文本編碼器和一個完全可訓練視覺編碼器的全並行注意力架構以提高文本忠實度,以及3)一種新穎的從粗到細的多階段微調方法,逐漸推動視覺品質的邊界。我們的研究表明,Imagine yourself 超越了最先進的個性化模型,在身份保留、視覺品質和文本對齊方面展現出卓越的能力。該模型為各種個性化應用奠定了堅實基礎。人類評估結果驗證了該模型在所有方面(身份保留、文本忠實度和視覺吸引力)上相對於先前的個性化模型具有最先進的優越性。
English
Diffusion models have demonstrated remarkable efficacy across various
image-to-image tasks. In this research, we introduce Imagine yourself, a
state-of-the-art model designed for personalized image generation. Unlike
conventional tuning-based personalization techniques, Imagine yourself operates
as a tuning-free model, enabling all users to leverage a shared framework
without individualized adjustments. Moreover, previous work met challenges
balancing identity preservation, following complex prompts and preserving good
visual quality, resulting in models having strong copy-paste effect of the
reference images. Thus, they can hardly generate images following prompts that
require significant changes to the reference image, \eg, changing facial
expression, head and body poses, and the diversity of the generated images is
low. To address these limitations, our proposed method introduces 1) a new
synthetic paired data generation mechanism to encourage image diversity, 2) a
fully parallel attention architecture with three text encoders and a fully
trainable vision encoder to improve the text faithfulness, and 3) a novel
coarse-to-fine multi-stage finetuning methodology that gradually pushes the
boundary of visual quality. Our study demonstrates that Imagine yourself
surpasses the state-of-the-art personalization model, exhibiting superior
capabilities in identity preservation, visual quality, and text alignment. This
model establishes a robust foundation for various personalization applications.
Human evaluation results validate the model's SOTA superiority across all
aspects (identity preservation, text faithfulness, and visual appeal) compared
to the previous personalization models.Summary
AI-Generated Summary