可控制的個人化多服裝人像生成
Controllable Human Image Generation with Personalized Multi-Garments
November 25, 2024
作者: Yisol Choi, Sangkyung Kwak, Sihyun Yu, Hyungwon Choi, Jinwoo Shin
cs.AI
摘要
我們提出了BootComp,一個基於文本到圖像擴散模型的新框架,用於具有多個參考服裝的可控人類圖像生成。在這裡,主要瓶頸是訓練的數據獲取:收集每個人穿著的高質量參考服裝圖像的大規模數據集相當具有挑戰性,即理想情況下,需要手動收集每位人穿著的每件服裝照片。為了應對這一問題,我們提出了一個數據生成流程,通過引入一個模型從每個人類圖像中提取任何參考服裝圖像,構建一個包含人類和多種服裝配對的大型合成數據集。為了確保數據質量,我們還提出了一種過濾策略,基於衡量人類圖像中呈現的服裝與提取服裝之間的知覺相似性來刪除不良生成數據。最後,通過利用構建的合成數據集,我們訓練了一個擁有兩個平行去噪路徑的擴散模型,這兩個路徑使用多種服裝圖像作為條件來生成人類圖像,同時保留其細節。我們進一步展示了我們的框架的廣泛應用性,通過將其適應到時尚領域的不同類型的基於參考的生成,包括虛擬試穿,以及具有其他條件(例如姿勢、面部等)的可控人類圖像生成。
English
We present BootComp, a novel framework based on text-to-image diffusion
models for controllable human image generation with multiple reference
garments. Here, the main bottleneck is data acquisition for training:
collecting a large-scale dataset of high-quality reference garment images per
human subject is quite challenging, i.e., ideally, one needs to manually gather
every single garment photograph worn by each human. To address this, we propose
a data generation pipeline to construct a large synthetic dataset, consisting
of human and multiple-garment pairs, by introducing a model to extract any
reference garment images from each human image. To ensure data quality, we also
propose a filtering strategy to remove undesirable generated data based on
measuring perceptual similarities between the garment presented in human image
and extracted garment. Finally, by utilizing the constructed synthetic dataset,
we train a diffusion model having two parallel denoising paths that use
multiple garment images as conditions to generate human images while preserving
their fine-grained details. We further show the wide-applicability of our
framework by adapting it to different types of reference-based generation in
the fashion domain, including virtual try-on, and controllable human image
generation with other conditions, e.g., pose, face, etc.Summary
AI-Generated Summary