ChatPaper.aiChatPaper

OminiControl:扩散Transformer的最小通用控制

OminiControl: Minimal and Universal Control for Diffusion Transformer

November 22, 2024
作者: Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, Xinchao Wang
cs.AI

摘要

在本文中,我们介绍了OminiControl,这是一个高度多功能且参数高效的框架,将图像条件整合到预训练的扩散Transformer(DiT)模型中。在其核心,OminiControl利用参数重用机制,使DiT能够使用自身作为强大的骨干来编码图像条件,并利用其灵活的多模态注意力处理器对其进行处理。与现有方法不同,现有方法严重依赖具有复杂架构的额外编码器模块,OminiControl(1)有效且高效地将注入的图像条件整合到仅增加约0.1%的额外参数中,(2)以统一方式处理广泛的图像调节任务,包括主题驱动生成和空间对齐条件,如边缘、深度等。值得注意的是,通过在DiT本身生成的图像上进行训练,实现了这些功能,这对于主题驱动生成特别有益。广泛的评估表明,OminiControl在主题驱动和空间对齐条件生成方面均优于现有基于UNet和适应DiT的模型。此外,我们发布了我们的训练数据集Subjects200K,这是一个包含超过200,000个身份一致图像的多样化收集,以及一个高效的数据合成管道,以推动主题一致生成领域的研究。
English
In this paper, we introduce OminiControl, a highly versatile and parameter-efficient framework that integrates image conditions into pre-trained Diffusion Transformer (DiT) models. At its core, OminiControl leverages a parameter reuse mechanism, enabling the DiT to encode image conditions using itself as a powerful backbone and process them with its flexible multi-modal attention processors. Unlike existing methods, which rely heavily on additional encoder modules with complex architectures, OminiControl (1) effectively and efficiently incorporates injected image conditions with only ~0.1% additional parameters, and (2) addresses a wide range of image conditioning tasks in a unified manner, including subject-driven generation and spatially-aligned conditions such as edges, depth, and more. Remarkably, these capabilities are achieved by training on images generated by the DiT itself, which is particularly beneficial for subject-driven generation. Extensive evaluations demonstrate that OminiControl outperforms existing UNet-based and DiT-adapted models in both subject-driven and spatially-aligned conditional generation. Additionally, we release our training dataset, Subjects200K, a diverse collection of over 200,000 identity-consistent images, along with an efficient data synthesis pipeline to advance research in subject-consistent generation.

Summary

AI-Generated Summary

PDF6010November 25, 2024