DreamO:图像定制的一体化框架
DreamO: A Unified Framework for Image Customization
April 23, 2025
作者: Chong Mou, Yanze Wu, Wenxu Wu, Zinan Guo, Pengze Zhang, Yufeng Cheng, Yiming Luo, Fei Ding, Shiwen Zhang, Xinghui Li, Mengtian Li, Songtao Zhao, Jian Zhang, Qian He, Xinglong Wu
cs.AI
摘要
近期,针对图像定制(如身份、主体、风格、背景等)的大量研究展示了大规模生成模型在定制能力上的强大表现。然而,大多数方法专为特定任务设计,限制了其在结合不同类型条件时的通用性。开发一个统一的图像定制框架仍是一个开放性的挑战。本文中,我们提出了DreamO,一个旨在支持广泛任务并促进多种条件无缝集成的图像定制框架。具体而言,DreamO采用扩散变换器(DiT)框架来统一处理不同类型的输入。在训练过程中,我们构建了一个包含多种定制任务的大规模训练数据集,并引入特征路由约束,以促进从参考图像中精确查询相关信息。此外,我们设计了一种占位符策略,将特定占位符与特定位置的条件关联起来,从而实现对生成结果中条件放置的控制。同时,我们采用了一种由三个阶段组成的渐进式训练策略:初始阶段聚焦于数据有限的简单任务,以建立基线一致性;全面训练阶段,以全面提升定制能力;以及最终的质量对齐阶段,用于纠正由低质量数据引入的质量偏差。大量实验证明,所提出的DreamO能够高质量地有效执行各种图像定制任务,并灵活整合不同类型的控制条件。
English
Recently, extensive research on image customization (e.g., identity, subject,
style, background, etc.) demonstrates strong customization capabilities in
large-scale generative models. However, most approaches are designed for
specific tasks, restricting their generalizability to combine different types
of condition. Developing a unified framework for image customization remains an
open challenge. In this paper, we present DreamO, an image customization
framework designed to support a wide range of tasks while facilitating seamless
integration of multiple conditions. Specifically, DreamO utilizes a diffusion
transformer (DiT) framework to uniformly process input of different types.
During training, we construct a large-scale training dataset that includes
various customization tasks, and we introduce a feature routing constraint to
facilitate the precise querying of relevant information from reference images.
Additionally, we design a placeholder strategy that associates specific
placeholders with conditions at particular positions, enabling control over the
placement of conditions in the generated results. Moreover, we employ a
progressive training strategy consisting of three stages: an initial stage
focused on simple tasks with limited data to establish baseline consistency, a
full-scale training stage to comprehensively enhance the customization
capabilities, and a final quality alignment stage to correct quality biases
introduced by low-quality data. Extensive experiments demonstrate that the
proposed DreamO can effectively perform various image customization tasks with
high quality and flexibly integrate different types of control conditions.Summary
AI-Generated Summary