FashionComposer:时尚图像生成技术
FashionComposer: Compositional Fashion Image Generation
December 18, 2024
作者: Sihui Ji, Yiyang Wang, Xi Chen, Xiaogang Xu, Hao Luo, Hengshuang Zhao
cs.AI
摘要
我们提出了FashionComposer用于生成时尚图像。与先前的方法不同,FashionComposer非常灵活。它接受多模态输入(即文本提示、参数化人体模型、服装图像和面部图像),支持个性化人体外观、姿势和体型,并一次性分配多件服装。为实现这一目标,我们首先开发了一个能够处理多样输入模态的通用框架。我们构建了经过缩放的训练数据,以增强模型的稳健的组合能力。为了无缝地容纳多个参考图像(服装和面部),我们将这些参考图像组织在单个图像中作为“资产库”,并采用参考UNet来提取外观特征。为了将外观特征注入到生成的结果中的正确像素中,我们提出了主体绑定注意力。它将来自不同“资产”的外观特征与相应的文本特征绑定在一起。通过这种方式,模型可以根据语义理解每个资产,支持任意数量和类型的参考图像。作为一个全面的解决方案,FashionComposer还支持许多其他应用,如人物相册生成、多样化虚拟试穿任务等。
English
We present FashionComposer for compositional fashion image generation. Unlike
previous methods, FashionComposer is highly flexible. It takes multi-modal
input (i.e., text prompt, parametric human model, garment image, and face
image) and supports personalizing the appearance, pose, and figure of the human
and assigning multiple garments in one pass. To achieve this, we first develop
a universal framework capable of handling diverse input modalities. We
construct scaled training data to enhance the model's robust compositional
capabilities. To accommodate multiple reference images (garments and faces)
seamlessly, we organize these references in a single image as an "asset
library" and employ a reference UNet to extract appearance features. To inject
the appearance features into the correct pixels in the generated result, we
propose subject-binding attention. It binds the appearance features from
different "assets" with the corresponding text features. In this way, the model
could understand each asset according to their semantics, supporting arbitrary
numbers and types of reference images. As a comprehensive solution,
FashionComposer also supports many other applications like human album
generation, diverse virtual try-on tasks, etc.Summary
AI-Generated Summary