个性化多服装可控人类图像生成
Controllable Human Image Generation with Personalized Multi-Garments
November 25, 2024
作者: Yisol Choi, Sangkyung Kwak, Sihyun Yu, Hyungwon Choi, Jinwoo Shin
cs.AI
摘要
我们提出了BootComp,这是一个基于文本到图像扩散模型的新颖框架,用于具有多个参考服装的可控人类图像生成。在这里,主要瓶颈是训练的数据获取:收集每个人穿着的高质量参考服装图像的大规模数据集非常具有挑战性,即理想情况下,需要手动收集每个人穿着的每件服装的照片。为了解决这个问题,我们提出了一个数据生成流水线,通过引入一个模型从每个人类图像中提取任何参考服装图像,构建一个由人类和多件服装配对组成的大型合成数据集。为了确保数据质量,我们还提出了一种过滤策略,根据衡量人类图像中呈现的服装与提取的服装之间的感知相似性来去除不良生成数据。最后,通过利用构建的合成数据集,我们训练了一个扩散模型,具有两个并行去噪路径,这些路径使用多个服装图像作为条件来生成人类图像,同时保留其细粒度细节。我们进一步展示了我们的框架的广泛适用性,通过将其调整为时尚领域中不同类型的基于参考的生成,包括虚拟试穿,以及具有其他条件(如姿势、面部等)的可控人类图像生成。
English
We present BootComp, a novel framework based on text-to-image diffusion
models for controllable human image generation with multiple reference
garments. Here, the main bottleneck is data acquisition for training:
collecting a large-scale dataset of high-quality reference garment images per
human subject is quite challenging, i.e., ideally, one needs to manually gather
every single garment photograph worn by each human. To address this, we propose
a data generation pipeline to construct a large synthetic dataset, consisting
of human and multiple-garment pairs, by introducing a model to extract any
reference garment images from each human image. To ensure data quality, we also
propose a filtering strategy to remove undesirable generated data based on
measuring perceptual similarities between the garment presented in human image
and extracted garment. Finally, by utilizing the constructed synthetic dataset,
we train a diffusion model having two parallel denoising paths that use
multiple garment images as conditions to generate human images while preserving
their fine-grained details. We further show the wide-applicability of our
framework by adapting it to different types of reference-based generation in
the fashion domain, including virtual try-on, and controllable human image
generation with other conditions, e.g., pose, face, etc.Summary
AI-Generated Summary