PixelMan：透過像素操作和生成的擴散模型實現一致的物件編輯

摘要

最近的研究探索了擴散模型（DMs）在一致物件編輯方面的潛力，旨在修改物件的位置、大小、組成等，同時保持物件和背景的一致性，而不改變其紋理和屬性。目前推理時方法通常依賴於 DDIM 逆向，這從本質上損害了效率和編輯圖像的可達一致性。最近的方法還利用能量引導，通過迭代更新預測的噪聲，可能將潛在變化遠離原始圖像，導致失真。在本文中，我們提出了 PixelMan，一種無需逆向和訓練的方法，通過像素操作和生成實現一致的物件編輯，在這裡我們直接在像素空間中在目標位置創建源物件的副本，並引入一種高效的採樣方法，逐步將操作的物件調和到目標位置並修復其原始位置，同時通過將要生成的編輯圖像錨定到像素操作圖像以及在推理過程中引入各種保持一致性的優化技術來確保圖像的一致性。基於基準數據集的實驗評估以及廣泛的視覺比較表明，在僅 16 步推理中，PixelMan 在多個一致物件編輯任務上優於一系列最先進的基於訓練和無需訓練的方法（通常需要 50 步）。

English

Recent research explores the potential of Diffusion Models (DMs) for consistent object editing, which aims to modify object position, size, and composition, etc., while preserving the consistency of objects and background without changing their texture and attributes. Current inference-time methods often rely on DDIM inversion, which inherently compromises efficiency and the achievable consistency of edited images. Recent methods also utilize energy guidance which iteratively updates the predicted noise and can drive the latents away from the original image, resulting in distortions. In this paper, we propose PixelMan, an inversion-free and training-free method for achieving consistent object editing via Pixel Manipulation and generation, where we directly create a duplicate copy of the source object at target location in the pixel space, and introduce an efficient sampling approach to iteratively harmonize the manipulated object into the target location and inpaint its original location, while ensuring image consistency by anchoring the edited image to be generated to the pixel-manipulated image as well as by introducing various consistency-preserving optimization techniques during inference. Experimental evaluations based on benchmark datasets as well as extensive visual comparisons show that in as few as 16 inference steps, PixelMan outperforms a range of state-of-the-art training-based and training-free methods (usually requiring 50 steps) on multiple consistent object editing tasks.

PixelMan：透過像素操作和生成的擴散模型實現一致的物件編輯

PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation

摘要

Support