FlowEdit: 사전 훈련된 플로우 모델을 사용한 반전 없는 텍스트 기반 편집

초록

사전 훈련된 텍스트-이미지 (T2I) 확산/흐름 모델을 사용하여 실제 이미지를 편집할 때 종종 이미지를 해당하는 잡음 맵으로 반전시키는 과정이 포함됩니다. 그러나 반전 그 자체로는 일반적으로 만족할만한 결과를 얻기에는 충분하지 않으며, 따라서 많은 방법들이 샘플링 과정에 추가 개입합니다. 이러한 방법들은 향상된 결과를 얻지만 모델 아키텍처 간에 원활하게 전이되지는 않습니다. 본 논문에서는 FlowEdit을 소개합니다. 이는 사전 훈련된 T2I 흐름 모델을 위한 텍스트 기반 편집 방법으로, 반전 없이 최적화 없이 모델에 중립적입니다. 저희 방법은 소스와 타겟 분포(소스와 타겟 텍스트 프롬프트에 해당) 사이를 직접 매핑하는 ODE를 구성하며, 반전 방법보다 낮은 전송 비용을 달성합니다. 이는 Stable Diffusion 3 및 FLUX를 통해 보여주는 것처럼 최첨단 결과를 이끌어 냅니다. 코드와 예시는 프로젝트 웹페이지에서 확인하실 수 있습니다.

English

Editing real images using a pre-trained text-to-image (T2I) diffusion/flow model often involves inverting the image into its corresponding noise map. However, inversion by itself is typically insufficient for obtaining satisfactory results, and therefore many methods additionally intervene in the sampling process. Such methods achieve improved results but are not seamlessly transferable between model architectures. Here, we introduce FlowEdit, a text-based editing method for pre-trained T2I flow models, which is inversion-free, optimization-free and model agnostic. Our method constructs an ODE that directly maps between the source and target distributions (corresponding to the source and target text prompts) and achieves a lower transport cost than the inversion approach. This leads to state-of-the-art results, as we illustrate with Stable Diffusion 3 and FLUX. Code and examples are available on the project's webpage.

FlowEdit: 사전 훈련된 플로우 모델을 사용한 반전 없는 텍스트 기반 편집

FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models

초록

Support