수식적 확률 미분 방정식을 사용한 시맨틱 이미지 반전 및 편집

초록

생성 모델은 무작위 잡음을 이미지로 변환하며, 이들의 역변환은 이미지를 구조화된 잡음으로 다시 변환하여 회복 및 편집을 목표로 합니다. 본 논문은 두 가지 주요 작업, 즉 (i) 역변환과 (ii) 확률적으로 수정된 흐름 모델(예: Flux와 같은)을 사용하여 실제 이미지의 편집을 다룹니다. 최근 이미지 생성 모델링 분야를 지배해온 확산 모델(DMs)은 역변환 시 드리프트와 확산의 비선형성으로 인해 충실성과 편집 가능성에 도전을 제기합니다. 기존 최첨단 DM 역변환 방법은 추가 매개변수의 교육 또는 잠재 변수에 대한 테스트 시 최적화에 의존하는데, 이는 실제로는 비용이 많이 듭니다. 흐름 모델(RFs)은 확산 모델에 대한 유망한 대안을 제공하지만, 그 역변환은 미개척되어 왔습니다. 우리는 선형 이차 조절기를 통해 유도된 동적 최적 제어를 사용하여 RF 역변환을 제안합니다. 우리는 결과적인 벡터 필드가 정정된 확률적 미분 방정식과 동등함을 증명합니다. 게다가 우리는 Flux에 대한 확률적 샘플러를 설계하기 위해 우리의 프레임워크를 확장합니다. 우리의 역변환 방법은 제로샷 역변환 및 편집에서 최첨단 성능을 제공하며, 스트로크-이미지 합성 및 의미론적 이미지 편집에서 이전 작업을 능가하는 대규모 인간 평가를 통해 사용자 선호도를 확인합니다.

English

Generative models transform random noise into images; their inversion aims to transform images back to structured noise for recovery and editing. This paper addresses two key tasks: (i) inversion and (ii) editing of a real image using stochastic equivalents of rectified flow models (such as Flux). Although Diffusion Models (DMs) have recently dominated the field of generative modeling for images, their inversion presents faithfulness and editability challenges due to nonlinearities in drift and diffusion. Existing state-of-the-art DM inversion approaches rely on training of additional parameters or test-time optimization of latent variables; both are expensive in practice. Rectified Flows (RFs) offer a promising alternative to diffusion models, yet their inversion has been underexplored. We propose RF inversion using dynamic optimal control derived via a linear quadratic regulator. We prove that the resulting vector field is equivalent to a rectified stochastic differential equation. Additionally, we extend our framework to design a stochastic sampler for Flux. Our inversion method allows for state-of-the-art performance in zero-shot inversion and editing, outperforming prior works in stroke-to-image synthesis and semantic image editing, with large-scale human evaluations confirming user preference.

수식적 확률 미분 방정식을 사용한 시맨틱 이미지 반전 및 편집

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

초록

Support