MagicTailor:在文本到圖像擴散模型中的組件可控個性化
MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models
October 17, 2024
作者: Donghao Zhou, Jiancheng Huang, Jinbin Bai, Jiaze Wang, Hao Chen, Guangyong Chen, Xiaowei Hu, Pheng-Ann Heng
cs.AI
摘要
最近在文本到圖像(T2I)擴散模型方面的進展使得從文本提示創建高質量圖像成為可能,但它們仍然難以精確控制特定視覺概念的生成。現有方法可以通過從參考圖像學習來複製給定概念,但它們缺乏對概念內個別組件進行精細定制的靈活性。本文介紹了組件可控個性化,這是一項新穎任務,通過允許用戶在個性化視覺概念時重新配置特定組件,從而推動了T2I模型的界限。這個任務特別具有挑戰性,主要有兩個障礙:語義污染,即不需要的視覺元素損壞了個性化概念,以及語義不平衡,導致了對概念和組件的不成比例學習。為了克服這些挑戰,我們設計了MagicTailor,一個創新框架,利用動態遮罩降級(DM-Deg)動態干擾不需要的視覺語義,並利用雙流平衡(DS-Bal)為所需的視覺語義建立平衡學習範式。廣泛的比較、消融和分析表明,MagicTailor不僅在這一具有挑戰性的任務中表現優異,而且對實際應用具有重要潛力,為更加細緻和創造性的圖像生成打開了新途徑。
English
Recent advancements in text-to-image (T2I) diffusion models have enabled the
creation of high-quality images from text prompts, but they still struggle to
generate images with precise control over specific visual concepts. Existing
approaches can replicate a given concept by learning from reference images, yet
they lack the flexibility for fine-grained customization of the individual
component within the concept. In this paper, we introduce
component-controllable personalization, a novel task that pushes the boundaries
of T2I models by allowing users to reconfigure specific components when
personalizing visual concepts. This task is particularly challenging due to two
primary obstacles: semantic pollution, where unwanted visual elements corrupt
the personalized concept, and semantic imbalance, which causes disproportionate
learning of the concept and component. To overcome these challenges, we design
MagicTailor, an innovative framework that leverages Dynamic Masked Degradation
(DM-Deg) to dynamically perturb undesired visual semantics and Dual-Stream
Balancing (DS-Bal) to establish a balanced learning paradigm for desired visual
semantics. Extensive comparisons, ablations, and analyses demonstrate that
MagicTailor not only excels in this challenging task but also holds significant
promise for practical applications, paving the way for more nuanced and
creative image generation.Summary
AI-Generated Summary