Click2Mask:具有動態遮罩生成的本地編輯
Click2Mask: Local Editing with Dynamic Mask Generation
September 12, 2024
作者: Omer Regev, Omri Avrahami, Dani Lischinski
cs.AI
摘要
最近生成模型的進步已經徹底改變了圖像生成和編輯,使這些任務對非專家變得更加可行。本文專注於局部圖像編輯,特別是向模糊指定區域添加新內容的任務。現有方法通常需要精確的遮罩或詳細的位置描述,這可能很繁瑣並容易出錯。我們提出了Click2Mask,一種新穎的方法,通過僅需要單個參考點(以及內容描述)來簡化局部編輯過程。在 Blended Latent Diffusion(BLD)過程中,透過基於 CLIP 的遮罩誘導語義損失,使遮罩在該點周圍動態擴展。Click2Mask克服了基於分割和依賴微調的方法的限制,提供了一種更加用戶友好和情境準確的解決方案。我們的實驗表明,Click2Mask不僅減少了用戶的努力,而且在人類判斷和自動指標方面,與SoTA方法相比,提供了競爭力或更優秀的局部圖像操作結果。關鍵貢獻包括簡化用戶輸入、能夠自由添加不受現有區段限制的對象,以及我們動態遮罩方法在其他編輯方法中的整合潛力。
English
Recent advancements in generative models have revolutionized image generation
and editing, making these tasks accessible to non-experts. This paper focuses
on local image editing, particularly the task of adding new content to a
loosely specified area. Existing methods often require a precise mask or a
detailed description of the location, which can be cumbersome and prone to
errors. We propose Click2Mask, a novel approach that simplifies the local
editing process by requiring only a single point of reference (in addition to
the content description). A mask is dynamically grown around this point during
a Blended Latent Diffusion (BLD) process, guided by a masked CLIP-based
semantic loss. Click2Mask surpasses the limitations of segmentation-based and
fine-tuning dependent methods, offering a more user-friendly and contextually
accurate solution. Our experiments demonstrate that Click2Mask not only
minimizes user effort but also delivers competitive or superior local image
manipulation results compared to SoTA methods, according to both human
judgement and automatic metrics. Key contributions include the simplification
of user input, the ability to freely add objects unconstrained by existing
segments, and the integration potential of our dynamic mask approach within
other editing methods.Summary
AI-Generated Summary