MagicTailor：在文本到圖像擴散模型中的組件可控個性化

摘要

最近在文本到圖像（T2I）擴散模型方面的進展使得從文本提示創建高質量圖像成為可能，但它們仍然難以精確控制特定視覺概念的生成。現有方法可以通過從參考圖像學習來複製給定概念，但它們缺乏對概念內個別組件進行精細定制的靈活性。本文介紹了組件可控個性化，這是一項新穎任務，通過允許用戶在個性化視覺概念時重新配置特定組件，從而推動了T2I模型的界限。這個任務特別具有挑戰性，主要有兩個障礙：語義污染，即不需要的視覺元素損壞了個性化概念，以及語義不平衡，導致了對概念和組件的不成比例學習。為了克服這些挑戰，我們設計了MagicTailor，一個創新框架，利用動態遮罩降級（DM-Deg）動態干擾不需要的視覺語義，並利用雙流平衡（DS-Bal）為所需的視覺語義建立平衡學習範式。廣泛的比較、消融和分析表明，MagicTailor不僅在這一具有挑戰性的任務中表現優異，而且對實際應用具有重要潛力，為更加細緻和創造性的圖像生成打開了新途徑。

English

Recent advancements in text-to-image (T2I) diffusion models have enabled the creation of high-quality images from text prompts, but they still struggle to generate images with precise control over specific visual concepts. Existing approaches can replicate a given concept by learning from reference images, yet they lack the flexibility for fine-grained customization of the individual component within the concept. In this paper, we introduce component-controllable personalization, a novel task that pushes the boundaries of T2I models by allowing users to reconfigure specific components when personalizing visual concepts. This task is particularly challenging due to two primary obstacles: semantic pollution, where unwanted visual elements corrupt the personalized concept, and semantic imbalance, which causes disproportionate learning of the concept and component. To overcome these challenges, we design MagicTailor, an innovative framework that leverages Dynamic Masked Degradation (DM-Deg) to dynamically perturb undesired visual semantics and Dual-Stream Balancing (DS-Bal) to establish a balanced learning paradigm for desired visual semantics. Extensive comparisons, ablations, and analyses demonstrate that MagicTailor not only excels in this challenging task but also holds significant promise for practical applications, paving the way for more nuanced and creative image generation.

MagicTailor：在文本到圖像擴散模型中的組件可控個性化

MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models

摘要

Summary

Support

Support