PixelMan: ピクセル操作と生成を介した拡散モデルによる一貫したオブジェクト編集

要旨

最近の研究では、拡散モデル（DMs）の潜在能力が探求されており、オブジェクトの位置、サイズ、構成などを変更する一貫性のあるオブジェクト編集を目指しています。この際、オブジェクトと背景の一貫性を保ちつつ、テクスチャや属性を変更せずに編集された画像の一貫性を維持することが目標とされています。現在の推論時の手法は、しばしばDDIMの逆変換に依存しており、これは効率性と編集された画像の実現可能な一貫性を損なう傾向があります。最近の手法では、エネルギーガイダンスも利用されており、予測されたノイズを反復的に更新し、潜在変数を元の画像から遠ざけ、歪みを引き起こす可能性があります。本論文では、PixelManという、ピクセル操作と生成を通じて一貫性のあるオブジェクト編集を実現するための逆変換フリーかつ学習フリーの手法を提案します。ここでは、ピクセル空間でソースオブジェクトの複製を直接目的の位置に作成し、効率的なサンプリング手法を導入して、編集されたオブジェクトを目標位置に調和させ、元の位置を修復しつつ、推論中に画像の一貫性を確保するために、編集される画像をピクセル操作された画像にアンカーし、推論中にさまざまな一貫性を保つ最適化手法を導入します。ベンチマークデータセットに基づく実験評価と幅広い視覚的比較により、16回の推論ステップでPixelManが、通常50回のステップが必要な一連の最先端の学習ベースおよび学習フリーの手法を上回ることが示されました。

English

Recent research explores the potential of Diffusion Models (DMs) for consistent object editing, which aims to modify object position, size, and composition, etc., while preserving the consistency of objects and background without changing their texture and attributes. Current inference-time methods often rely on DDIM inversion, which inherently compromises efficiency and the achievable consistency of edited images. Recent methods also utilize energy guidance which iteratively updates the predicted noise and can drive the latents away from the original image, resulting in distortions. In this paper, we propose PixelMan, an inversion-free and training-free method for achieving consistent object editing via Pixel Manipulation and generation, where we directly create a duplicate copy of the source object at target location in the pixel space, and introduce an efficient sampling approach to iteratively harmonize the manipulated object into the target location and inpaint its original location, while ensuring image consistency by anchoring the edited image to be generated to the pixel-manipulated image as well as by introducing various consistency-preserving optimization techniques during inference. Experimental evaluations based on benchmark datasets as well as extensive visual comparisons show that in as few as 16 inference steps, PixelMan outperforms a range of state-of-the-art training-based and training-free methods (usually requiring 50 steps) on multiple consistent object editing tasks.

PixelMan: ピクセル操作と生成を介した拡散モデルによる一貫したオブジェクト編集

PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation

要旨

Summary

Support

Support