ChatPaper.aiChatPaper

Add-it:使用预训练扩散模型在图像中进行无需训练的对象插入

Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

November 11, 2024
作者: Yoad Tewel, Rinon Gal, Dvir Samuel Yuval Atzmon, Lior Wolf, Gal Chechik
cs.AI

摘要

根据文本指令在图像中添加物体是语义图像编辑中的一项具有挑战性的任务,需要在保留原始场景和将新物体无缝整合到合适位置之间取得平衡。尽管已经做出了大量努力,现有模型通常在这种平衡上存在困难,特别是在复杂场景中寻找自然位置以添加物体时。我们引入了Add-it,这是一种无需训练的方法,它通过扩展扩散模型的注意机制来整合来自三个关键来源的信息:场景图像、文本提示和生成的图像本身。我们的加权扩展注意机制保持结构一致性和细节,同时确保自然物体放置。在没有特定任务微调的情况下,Add-it 在真实和生成的图像插入基准测试中取得了最先进的结果,包括我们新构建的“添加功能基准测试”,用于评估物体放置的合理性,优于监督方法。人类评估显示,在超过80%的情况下,人们更喜欢使用Add-it,并且它还在各种自动化指标上展现出改进。
English
Adding Object into images based on text instructions is a challenging task in semantic image editing, requiring a balance between preserving the original scene and seamlessly integrating the new object in a fitting location. Despite extensive efforts, existing models often struggle with this balance, particularly with finding a natural location for adding an object in complex scenes. We introduce Add-it, a training-free approach that extends diffusion models' attention mechanisms to incorporate information from three key sources: the scene image, the text prompt, and the generated image itself. Our weighted extended-attention mechanism maintains structural consistency and fine details while ensuring natural object placement. Without task-specific fine-tuning, Add-it achieves state-of-the-art results on both real and generated image insertion benchmarks, including our newly constructed "Additing Affordance Benchmark" for evaluating object placement plausibility, outperforming supervised methods. Human evaluations show that Add-it is preferred in over 80% of cases, and it also demonstrates improvements in various automated metrics.

Summary

AI-Generated Summary

PDF676November 12, 2024