ChatPaper.aiChatPaper

OmniGen:統一圖像生成

OmniGen: Unified Image Generation

September 17, 2024
作者: Shitao Xiao, Yueze Wang, Junjie Zhou, Huaying Yuan, Xingrun Xing, Ruiran Yan, Shuting Wang, Tiejun Huang, Zheng Liu
cs.AI

摘要

在這項工作中,我們介紹了 OmniGen,一種新的統一影像生成擴散模型。與流行的擴散模型(例如 Stable Diffusion)不同,OmniGen 不再需要額外的模組,如 ControlNet 或 IP-Adapter 來處理多樣的控制條件。OmniGen 具有以下特點:1)統一性:OmniGen 不僅展示了從文本到影像的生成能力,還內在支持其他下游任務,如影像編輯、主題驅動生成和視覺條件生成。此外,OmniGen 可以通過將其轉換為影像生成任務來處理傳統的計算機視覺任務,如邊緣檢測和人體姿勢識別。2)簡單性:OmniGen 的架構非常簡化,無需額外的文本編碼器。此外,與現有的擴散模型相比,它更加用戶友好,使得可以通過指示完成複雜任務,無需額外的預處理步驟(例如人體姿勢估計),從而顯著簡化了影像生成的工作流程。3)知識轉移:通過以統一格式學習,OmniGen 能夠有效地在不同任務之間轉移知識,處理未見過的任務和領域,並展示新的能力。我們還探索了模型的推理能力和鏈式思維機制的潛在應用。這項工作代表了對通用影像生成模型的首次嘗試,並且仍存在一些未解決的問題。我們將在 https://github.com/VectorSpaceLab/OmniGen 開源相關資源,以促進該領域的進步。
English
In this work, we introduce OmniGen, a new diffusion model for unified image generation. Unlike popular diffusion models (e.g., Stable Diffusion), OmniGen no longer requires additional modules such as ControlNet or IP-Adapter to process diverse control conditions. OmniGenis characterized by the following features: 1) Unification: OmniGen not only demonstrates text-to-image generation capabilities but also inherently supports other downstream tasks, such as image editing, subject-driven generation, and visual-conditional generation. Additionally, OmniGen can handle classical computer vision tasks by transforming them into image generation tasks, such as edge detection and human pose recognition. 2) Simplicity: The architecture of OmniGen is highly simplified, eliminating the need for additional text encoders. Moreover, it is more user-friendly compared to existing diffusion models, enabling complex tasks to be accomplished through instructions without the need for extra preprocessing steps (e.g., human pose estimation), thereby significantly simplifying the workflow of image generation. 3) Knowledge Transfer: Through learning in a unified format, OmniGen effectively transfers knowledge across different tasks, manages unseen tasks and domains, and exhibits novel capabilities. We also explore the model's reasoning capabilities and potential applications of chain-of-thought mechanism. This work represents the first attempt at a general-purpose image generation model, and there remain several unresolved issues. We will open-source the related resources at https://github.com/VectorSpaceLab/OmniGen to foster advancements in this field.

Summary

AI-Generated Summary

PDF1157November 16, 2024