OminiControl:擴散變壓器的最小和通用控制
OminiControl: Minimal and Universal Control for Diffusion Transformer
November 22, 2024
作者: Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, Xinchao Wang
cs.AI
摘要
本文介紹了OminiControl,這是一個高度多功能且參數高效的框架,將影像條件整合到預先訓練的擴散Transformer(DiT)模型中。在其核心,OminiControl利用參數重複使用機制,使DiT能夠使用自身作為強大的骨幹來編碼影像條件,並通過其靈活的多模態注意處理器處理這些條件。與現有方法不同,現有方法嚴重依賴具有複雜結構的額外編碼器模塊,而OminiControl(1)有效且高效地將注入的影像條件納入,僅增加約0.1%的額外參數,(2)以統一方式處理廣泛範圍的影像條件任務,包括以主題驅動的生成和空間對齊條件,如邊緣、深度等。值得注意的是,這些功能是通過對DiT本身生成的影像進行訓練實現的,這對於主題驅動的生成特別有益。廣泛的評估顯示,OminiControl在主題驅動和空間對齊條件生成方面均優於現有的基於UNet和DiT調整模型。此外,我們釋出了我們的訓練數據集Subjects200K,這是一個包含超過200,000個身份一致影像的多樣集合,以及一個高效的數據合成管道,以推動主題一致生成研究的進展。
English
In this paper, we introduce OminiControl, a highly versatile and
parameter-efficient framework that integrates image conditions into pre-trained
Diffusion Transformer (DiT) models. At its core, OminiControl leverages a
parameter reuse mechanism, enabling the DiT to encode image conditions using
itself as a powerful backbone and process them with its flexible multi-modal
attention processors. Unlike existing methods, which rely heavily on additional
encoder modules with complex architectures, OminiControl (1) effectively and
efficiently incorporates injected image conditions with only ~0.1% additional
parameters, and (2) addresses a wide range of image conditioning tasks in a
unified manner, including subject-driven generation and spatially-aligned
conditions such as edges, depth, and more. Remarkably, these capabilities are
achieved by training on images generated by the DiT itself, which is
particularly beneficial for subject-driven generation. Extensive evaluations
demonstrate that OminiControl outperforms existing UNet-based and DiT-adapted
models in both subject-driven and spatially-aligned conditional generation.
Additionally, we release our training dataset, Subjects200K, a diverse
collection of over 200,000 identity-consistent images, along with an efficient
data synthesis pipeline to advance research in subject-consistent generation.Summary
AI-Generated Summary