PixelMan:通过像素操作和生成的扩散模型实现一致的对象编辑
PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
December 18, 2024
作者: Liyao Jiang, Negar Hassanpour, Mohammad Salameh, Mohammadreza Samadi, Jiao He, Fengyu Sun, Di Niu
cs.AI
摘要
最近的研究探讨了扩散模型(DMs)在一致性对象编辑方面的潜力,旨在修改对象的位置、大小、构成等,同时保持对象和背景的一致性,而不改变它们的纹理和属性。当前推断时方法通常依赖于DDIM反演,这从本质上损害了效率和编辑图像的可实现一致性。最近的方法还利用能量引导,通过迭代更新预测的噪声,可能会使潜变量远离原始图像,导致失真。在本文中,我们提出了PixelMan,这是一种无需反演和无需训练的方法,通过像素操作和生成实现一致性对象编辑,我们直接在像素空间中在目标位置创建源对象的副本,并引入一种高效的采样方法,通过迭代将操作过的对象协调到目标位置,并修补其原始位置,同时通过将编辑后的图像锚定到像素操作后的图像以及在推断过程中引入各种保持一致性的优化技术,确保图像的一致性。基于基准数据集的实验评估以及广泛的视觉比较显示,仅经过16个推断步骤,PixelMan在多个一致性对象编辑任务上胜过一系列最先进的基于训练和无需训练的方法(通常需要50步)。
English
Recent research explores the potential of Diffusion Models (DMs) for
consistent object editing, which aims to modify object position, size, and
composition, etc., while preserving the consistency of objects and background
without changing their texture and attributes. Current inference-time methods
often rely on DDIM inversion, which inherently compromises efficiency and the
achievable consistency of edited images. Recent methods also utilize energy
guidance which iteratively updates the predicted noise and can drive the
latents away from the original image, resulting in distortions. In this paper,
we propose PixelMan, an inversion-free and training-free method for achieving
consistent object editing via Pixel Manipulation and generation, where we
directly create a duplicate copy of the source object at target location in the
pixel space, and introduce an efficient sampling approach to iteratively
harmonize the manipulated object into the target location and inpaint its
original location, while ensuring image consistency by anchoring the edited
image to be generated to the pixel-manipulated image as well as by introducing
various consistency-preserving optimization techniques during inference.
Experimental evaluations based on benchmark datasets as well as extensive
visual comparisons show that in as few as 16 inference steps, PixelMan
outperforms a range of state-of-the-art training-based and training-free
methods (usually requiring 50 steps) on multiple consistent object editing
tasks.Summary
AI-Generated Summary