ChatPaper.aiChatPaper

生成多图像合成数据用于文本到图像定制

Generating Multi-Image Synthetic Data for Text-to-Image Customization

February 3, 2025
作者: Nupur Kumari, Xi Yin, Jun-Yan Zhu, Ishan Misra, Samaneh Azadi
cs.AI

摘要

定制文本到图像模型使用户能够插入自定义概念并在未见过的环境中生成这些概念。现有方法要么依赖昂贵的测试时间优化,要么在单图像训练数据集上训练编码器而缺乏多图像监督,导致图像质量较差。我们提出了一种简单的方法来解决这两个限制。我们首先利用现有的文本到图像模型和三维数据集创建高质量的合成定制数据集(SynCD),其中包含同一对象在不同光照、背景和姿势下的多个图像。然后,我们提出了一种基于共享注意机制的新编码器架构,更好地整合了输入图像中的细粒度视觉细节。最后,我们提出了一种新的推理技术,通过对文本和图像引导向量进行归一化来减轻推理过程中的过曝问题。通过大量实验,我们展示了我们的模型,在合成数据集上训练,采用所提出的编码器和推理算法,优于现有的无调整方法在标准定制基准测试中的表现。
English
Customization of text-to-image models enables users to insert custom concepts and generate the concepts in unseen settings. Existing methods either rely on costly test-time optimization or train encoders on single-image training datasets without multi-image supervision, leading to worse image quality. We propose a simple approach that addresses both limitations. We first leverage existing text-to-image models and 3D datasets to create a high-quality Synthetic Customization Dataset (SynCD) consisting of multiple images of the same object in different lighting, backgrounds, and poses. We then propose a new encoder architecture based on shared attention mechanisms that better incorporate fine-grained visual details from input images. Finally, we propose a new inference technique that mitigates overexposure issues during inference by normalizing the text and image guidance vectors. Through extensive experiments, we show that our model, trained on the synthetic dataset with the proposed encoder and inference algorithm, outperforms existing tuning-free methods on standard customization benchmarks.

Summary

AI-Generated Summary

PDF82February 5, 2025