안정된 흐름: 훈련 없는 이미지 편집을 위한 중요한 레이어들

초록

확산 모델은 콘텐츠 합성 및 편집 분야를 혁신적으로 변화시켰다. 최근 모델들은 기존의 UNet 아키텍처를 확산 트랜스포머(DiT)로 대체하고, 향상된 훈련과 샘플링을 위해 플로우 매칭을 사용했다. 그러나 이러한 모델들은 생성 다양성이 제한적이다. 본 연구에서는 이 한계를 활용하여 주의 기능을 선택적으로 주입함으로써 일관된 이미지 편집을 수행한다. 주요 도전 과제는 UNet 기반 모델과 달리 DiT에는 곱삭한 종합 합성 구조가 없어 주입을 수행할 레이어가 명확하지 않다는 것이다. 따라서 DiT 내에서 이미지 형성에 중요한 "핵심 레이어"를 식별하는 자동 방법을 제안하고, 이러한 레이어가 비융통한 수정부터 객체 추가에 이르는 다양한 안정적인 편집 범위를 가능하게 하는 방법을 시연한다. 그 다음, 실제 이미지 편집을 가능하게 하기 위해 플로우 모델을 위한 개선된 이미지 역전 방법을 소개한다. 마지막으로 우리의 접근법을 질적 및 양적 비교, 사용자 연구를 통해 평가하고, 다양한 응용 분야에서의 효과를 시연한다. 프로젝트 페이지는 https://omriavrahami.com/stable-flow에서 확인할 수 있다.

English

Diffusion models have revolutionized the field of content synthesis and editing. Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT), and employed flow-matching for improved training and sampling. However, they exhibit limited generation diversity. In this work, we leverage this limitation to perform consistent image edits via selective injection of attention features. The main challenge is that, unlike the UNet-based models, DiT lacks a coarse-to-fine synthesis structure, making it unclear in which layers to perform the injection. Therefore, we propose an automatic method to identify "vital layers" within DiT, crucial for image formation, and demonstrate how these layers facilitate a range of controlled stable edits, from non-rigid modifications to object addition, using the same mechanism. Next, to enable real-image editing, we introduce an improved image inversion method for flow models. Finally, we evaluate our approach through qualitative and quantitative comparisons, along with a user study, and demonstrate its effectiveness across multiple applications. The project page is available at https://omriavrahami.com/stable-flow

안정된 흐름: 훈련 없는 이미지 편집을 위한 중요한 레이어들

Stable Flow: Vital Layers for Training-Free Image Editing

초록

Support