Add-it:使用預訓練擴散模型在圖像中進行無需訓練的物件插入
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
November 11, 2024
作者: Yoad Tewel, Rinon Gal, Dvir Samuel Yuval Atzmon, Lior Wolf, Gal Chechik
cs.AI
摘要
在語義圖像編輯中,根據文字指示將物件添加到圖像中是一項具有挑戰性的任務,需要在保留原始場景和無縫整合新物件到合適位置之間取得平衡。儘管做出了大量努力,現有模型通常在這種平衡方面存在困難,特別是在複雜場景中尋找添加物件的自然位置。我們介紹了Add-it,這是一種無需訓練的方法,它擴展了擴散模型的注意機制,以整合來自三個關鍵來源的信息:場景圖像、文字提示和生成的圖像本身。我們的加權擴展注意機制保持結構一致性和細節,同時確保自然的物件放置。在沒有任務特定微調的情況下,Add-it在真實和生成的圖像插入基準上實現了最先進的結果,包括我們新建的“添加可負擔基準”用於評估物件放置的合理性,優於監督方法。人類評估顯示,在超過80%的情況下,人們更喜歡使用Add-it,它還在各種自動化指標上展示了改進。
English
Adding Object into images based on text instructions is a challenging task in
semantic image editing, requiring a balance between preserving the original
scene and seamlessly integrating the new object in a fitting location. Despite
extensive efforts, existing models often struggle with this balance,
particularly with finding a natural location for adding an object in complex
scenes. We introduce Add-it, a training-free approach that extends diffusion
models' attention mechanisms to incorporate information from three key sources:
the scene image, the text prompt, and the generated image itself. Our weighted
extended-attention mechanism maintains structural consistency and fine details
while ensuring natural object placement. Without task-specific fine-tuning,
Add-it achieves state-of-the-art results on both real and generated image
insertion benchmarks, including our newly constructed "Additing Affordance
Benchmark" for evaluating object placement plausibility, outperforming
supervised methods. Human evaluations show that Add-it is preferred in over 80%
of cases, and it also demonstrates improvements in various automated metrics.Summary
AI-Generated Summary