穩定流:訓練免費圖像編輯的重要層
Stable Flow: Vital Layers for Training-Free Image Editing
November 21, 2024
作者: Omri Avrahami, Or Patashnik, Ohad Fried, Egor Nemchinov, Kfir Aberman, Dani Lischinski, Daniel Cohen-Or
cs.AI
摘要
擴散模型已經在內容合成和編輯領域引起了革命。最近的模型已經將傳統的 UNet 結構替換為擴散 Transformer(DiT),並採用流匹配來改善訓練和採樣。然而,它們展現出有限的生成多樣性。在這項工作中,我們利用這一限制通過選擇性注入注意力特徵來執行一致的圖像編輯。主要挑戰在於,與基於 UNet 的模型不同,DiT 缺乏粗到細的合成結構,使得在哪些層進行注入變得不明確。因此,我們提出了一種自動方法來識別 DiT 中的“關鍵層”,對圖像形成至關重要,並展示這些層如何通過相同機制促進一系列受控穩定的編輯,從非剛性修改到對象添加。接下來,為了實現真實圖像編輯,我們引入了一種改進的圖像反演方法用於流模型。最後,我們通過定性和定量比較以及用戶研究來評估我們的方法,並展示其在多個應用中的有效性。項目頁面位於 https://omriavrahami.com/stable-flow。
English
Diffusion models have revolutionized the field of content synthesis and
editing. Recent models have replaced the traditional UNet architecture with the
Diffusion Transformer (DiT), and employed flow-matching for improved training
and sampling. However, they exhibit limited generation diversity. In this work,
we leverage this limitation to perform consistent image edits via selective
injection of attention features. The main challenge is that, unlike the
UNet-based models, DiT lacks a coarse-to-fine synthesis structure, making it
unclear in which layers to perform the injection. Therefore, we propose an
automatic method to identify "vital layers" within DiT, crucial for image
formation, and demonstrate how these layers facilitate a range of controlled
stable edits, from non-rigid modifications to object addition, using the same
mechanism. Next, to enable real-image editing, we introduce an improved image
inversion method for flow models. Finally, we evaluate our approach through
qualitative and quantitative comparisons, along with a user study, and
demonstrate its effectiveness across multiple applications. The project page is
available at https://omriavrahami.com/stable-flowSummary
AI-Generated Summary