AnyDressing:透過潛在擴散模型進行可定制的多服裝虛擬試衣
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models
December 5, 2024
作者: Xinghui Li, Qichao Sun, Pengze Zhang, Fulong Ye, Zhichao Liao, Wanquan Feng, Songtao Zhao, Qian He
cs.AI
摘要
最近在基於擴散模型的文本和圖像提示生成以服裝為中心的圖像方面取得了令人印象深刻的進展。然而,現有方法缺乏對各種服飾組合的支持,並且在保留服裝細節並保持對文本提示的忠實度方面存在困難,這限制了它們在不同場景下的性能。本文專注於一個新任務,即多服裝虛擬試衣,我們提出了一種新穎的AnyDressing方法,用於根據任何組合的服裝和任何個性化文本提示來定制角色。AnyDressing包括兩個主要網絡,分別命名為GarmentsNet和DressingNet,它們分別專門用於提取詳細的服裝特徵和生成定制圖像。具體來說,我們在GarmentsNet中提出了一個高效且可擴展的模塊,稱為服裝特定特徵提取器,用於並行地編碼服裝紋理。這種設計可以防止服裝混淆,同時確保網絡效率。同時,我們在DressingNet中設計了一個自適應的Dressing-Attention機制和一種新穎的Instance-Level Garment Localization Learning策略,以準確地將多服裝特徵注入到相應的區域。這種方法有效地將多服裝紋理提示整合到生成的圖像中,進一步增強文本-圖像一致性。此外,我們引入了一種增強服裝紋理學習策略,以改善服裝的細緻紋理細節。由於我們精心設計的原因,AnyDressing可以作為一個插件模塊,輕鬆與擴散模型的任何社區控制擴展集成,從而提高合成圖像的多樣性和可控性。大量實驗表明,AnyDressing實現了最先進的結果。
English
Recent advances in garment-centric image generation from text and image
prompts based on diffusion models are impressive. However, existing methods
lack support for various combinations of attire, and struggle to preserve the
garment details while maintaining faithfulness to the text prompts, limiting
their performance across diverse scenarios. In this paper, we focus on a new
task, i.e., Multi-Garment Virtual Dressing, and we propose a novel AnyDressing
method for customizing characters conditioned on any combination of garments
and any personalized text prompts. AnyDressing comprises two primary networks
named GarmentsNet and DressingNet, which are respectively dedicated to
extracting detailed clothing features and generating customized images.
Specifically, we propose an efficient and scalable module called
Garment-Specific Feature Extractor in GarmentsNet to individually encode
garment textures in parallel. This design prevents garment confusion while
ensuring network efficiency. Meanwhile, we design an adaptive
Dressing-Attention mechanism and a novel Instance-Level Garment Localization
Learning strategy in DressingNet to accurately inject multi-garment features
into their corresponding regions. This approach efficiently integrates
multi-garment texture cues into generated images and further enhances
text-image consistency. Additionally, we introduce a Garment-Enhanced Texture
Learning strategy to improve the fine-grained texture details of garments.
Thanks to our well-craft design, AnyDressing can serve as a plug-in module to
easily integrate with any community control extensions for diffusion models,
improving the diversity and controllability of synthesized images. Extensive
experiments show that AnyDressing achieves state-of-the-art results.Summary
AI-Generated Summary