少至多泛化:通过上下文生成解锁更强可控性
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation
April 2, 2025
作者: Shaojin Wu, Mengqi Huang, Wenxu Wu, Yufeng Cheng, Fei Ding, Qian He
cs.AI
摘要
尽管主题驱动生成因其广泛应用在图像生成领域得到了深入探索,但在数据可扩展性和主题扩展性方面仍面临挑战。针对第一个挑战,从构建单一主题数据集转向多主题数据集并实现其规模化尤为困难。对于第二个挑战,近期方法多聚焦于单主题生成,难以应对多主题场景的需求。本研究提出了一种高度一致的数据合成流程以应对这一难题。该流程利用扩散变换器固有的上下文生成能力,生成高一致性的多主题配对数据。此外,我们引入了UNO模型,它包含渐进式跨模态对齐和通用旋转位置嵌入,是一个从文本到图像模型迭代训练而来的多图像条件主题到图像模型。大量实验表明,我们的方法在确保可控性的同时,在单主题及多主题驱动生成中均能实现高度一致性。
English
Although subject-driven generation has been extensively explored in image
generation due to its wide applications, it still has challenges in data
scalability and subject expansibility. For the first challenge, moving from
curating single-subject datasets to multiple-subject ones and scaling them is
particularly difficult. For the second, most recent methods center on
single-subject generation, making it hard to apply when dealing with
multi-subject scenarios. In this study, we propose a highly-consistent data
synthesis pipeline to tackle this challenge. This pipeline harnesses the
intrinsic in-context generation capabilities of diffusion transformers and
generates high-consistency multi-subject paired data. Additionally, we
introduce UNO, which consists of progressive cross-modal alignment and
universal rotary position embedding. It is a multi-image conditioned
subject-to-image model iteratively trained from a text-to-image model.
Extensive experiments show that our method can achieve high consistency while
ensuring controllability in both single-subject and multi-subject driven
generation.Summary
AI-Generated Summary