DreamO:圖像定制的一體化框架
DreamO: A Unified Framework for Image Customization
April 23, 2025
作者: Chong Mou, Yanze Wu, Wenxu Wu, Zinan Guo, Pengze Zhang, Yufeng Cheng, Yiming Luo, Fei Ding, Shiwen Zhang, Xinghui Li, Mengtian Li, Songtao Zhao, Jian Zhang, Qian He, Xinglong Wu
cs.AI
摘要
近期,針對圖像定制(如身份、主題、風格、背景等)的大量研究展示了大型生成模型在定制能力上的強大表現。然而,大多數方法僅針對特定任務設計,限制了其結合不同類型條件的通用性。開發一個統一的圖像定制框架仍是一個開放性挑戰。本文中,我們提出了DreamO,這是一個旨在支持廣泛任務並促進多種條件無縫整合的圖像定制框架。具體而言,DreamO利用擴散變換器(DiT)框架來統一處理不同類型的輸入。在訓練過程中,我們構建了一個包含多種定制任務的大規模訓練數據集,並引入了特徵路由約束以促進從參考圖像中精確查詢相關信息。此外,我們設計了一種佔位符策略,將特定佔位符與特定位置的條件關聯起來,從而實現對生成結果中條件放置的控制。同時,我們採用了一種由三個階段組成的漸進式訓練策略:初始階段專注於數據量有限的簡單任務以建立基礎一致性,全面訓練階段以全面提升定制能力,以及最終的質量對齊階段以糾正由低質量數據引入的質量偏差。大量實驗表明,所提出的DreamO能夠高質量地有效執行各種圖像定制任務,並靈活整合不同類型的控制條件。
English
Recently, extensive research on image customization (e.g., identity, subject,
style, background, etc.) demonstrates strong customization capabilities in
large-scale generative models. However, most approaches are designed for
specific tasks, restricting their generalizability to combine different types
of condition. Developing a unified framework for image customization remains an
open challenge. In this paper, we present DreamO, an image customization
framework designed to support a wide range of tasks while facilitating seamless
integration of multiple conditions. Specifically, DreamO utilizes a diffusion
transformer (DiT) framework to uniformly process input of different types.
During training, we construct a large-scale training dataset that includes
various customization tasks, and we introduce a feature routing constraint to
facilitate the precise querying of relevant information from reference images.
Additionally, we design a placeholder strategy that associates specific
placeholders with conditions at particular positions, enabling control over the
placement of conditions in the generated results. Moreover, we employ a
progressive training strategy consisting of three stages: an initial stage
focused on simple tasks with limited data to establish baseline consistency, a
full-scale training stage to comprehensively enhance the customization
capabilities, and a final quality alignment stage to correct quality biases
introduced by low-quality data. Extensive experiments demonstrate that the
proposed DreamO can effectively perform various image customization tasks with
high quality and flexibly integrate different types of control conditions.Summary
AI-Generated Summary