少至多泛化:通過上下文生成釋放更多可控性
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation
April 2, 2025
作者: Shaojin Wu, Mengqi Huang, Wenxu Wu, Yufeng Cheng, Fei Ding, Qian He
cs.AI
摘要
儘管主題驅動生成在圖像生成領域因其廣泛應用而得到了深入探索,但在數據可擴展性和主題可擴展性方面仍面臨挑戰。對於第一個挑戰,從單一主題數據集的策展轉向多主題數據集並進行擴展尤為困難。對於第二個挑戰,大多數最新方法集中於單一主題生成,這使得在處理多主題場景時難以應用。在本研究中,我們提出了一個高度一致的數據合成流程來應對這一挑戰。該流程利用擴散變換器的內在上下文生成能力,生成高度一致的多主題配對數據。此外,我們引入了UNO,它由漸進式跨模態對齊和通用旋轉位置嵌入組成,是一個從文本到圖像模型迭代訓練的多圖像條件主題到圖像模型。大量實驗表明,我們的方法在確保可控性的同時,能夠在單一主題和多主題驅動生成中實現高度一致性。
English
Although subject-driven generation has been extensively explored in image
generation due to its wide applications, it still has challenges in data
scalability and subject expansibility. For the first challenge, moving from
curating single-subject datasets to multiple-subject ones and scaling them is
particularly difficult. For the second, most recent methods center on
single-subject generation, making it hard to apply when dealing with
multi-subject scenarios. In this study, we propose a highly-consistent data
synthesis pipeline to tackle this challenge. This pipeline harnesses the
intrinsic in-context generation capabilities of diffusion transformers and
generates high-consistency multi-subject paired data. Additionally, we
introduce UNO, which consists of progressive cross-modal alignment and
universal rotary position embedding. It is a multi-image conditioned
subject-to-image model iteratively trained from a text-to-image model.
Extensive experiments show that our method can achieve high consistency while
ensuring controllability in both single-subject and multi-subject driven
generation.Summary
AI-Generated Summary