稳定流：训练免费图像编辑的关键层级

摘要

扩散模型已经彻底改变了内容合成和编辑领域。最近的模型已经用扩散Transformer（DiT）取代了传统的UNet架构，并采用了流匹配来改善训练和采样。然而，它们生成的多样性有限。在这项工作中，我们利用这一限制，通过有选择地注入注意力特征来执行一致的图像编辑。主要挑战在于，与基于UNet的模型不同，DiT缺乏粗到细的合成结构，因此不清楚在哪些层中执行注入。因此，我们提出了一种自动方法来识别DiT中的“关键层”，这些关键层对图像生成至关重要，并展示了这些层如何通过相同机制促进一系列可控的稳定编辑，从非刚性修改到对象添加。接下来，为了实现真实图像编辑，我们引入了一种改进的图像反演方法用于流模型。最后，我们通过定性和定量比较以及用户研究来评估我们的方法，并展示其在多个应用中的有效性。项目页面位于https://omriavrahami.com/stable-flow。

English

Diffusion models have revolutionized the field of content synthesis and editing. Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT), and employed flow-matching for improved training and sampling. However, they exhibit limited generation diversity. In this work, we leverage this limitation to perform consistent image edits via selective injection of attention features. The main challenge is that, unlike the UNet-based models, DiT lacks a coarse-to-fine synthesis structure, making it unclear in which layers to perform the injection. Therefore, we propose an automatic method to identify "vital layers" within DiT, crucial for image formation, and demonstrate how these layers facilitate a range of controlled stable edits, from non-rigid modifications to object addition, using the same mechanism. Next, to enable real-image editing, we introduce an improved image inversion method for flow models. Finally, we evaluate our approach through qualitative and quantitative comparisons, along with a user study, and demonstrate its effectiveness across multiple applications. The project page is available at https://omriavrahami.com/stable-flow

稳定流：训练免费图像编辑的关键层级

Stable Flow: Vital Layers for Training-Free Image Editing

摘要

Summary

Support