通过遮罩感知双扩散实现可供性感知的物体插入
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
December 19, 2024
作者: Jixuan He, Wanhua Li, Ye Liu, Junsik Kim, Donglai Wei, Hanspeter Pfister
cs.AI
摘要
作为一种常见的图像编辑操作,图像合成涉及将前景对象整合到背景场景中。在本文中,我们将Affordance概念从以人为中心的图像合成任务扩展到更一般的对象-场景合成框架,解决前景对象和背景场景之间复杂的相互作用。遵循Affordance原则,我们定义了认知能力感知的对象插入任务,旨在通过各种位置提示将任何对象无缝插入任何场景中。为了解决数据有限的问题并融入这一任务,我们构建了SAM-FB数据集,其中包含超过3,000个对象类别的3百万多个示例。此外,我们提出了Mask-Aware Dual Diffusion(MADD)模型,该模型利用双流架构同时对RGB图像和插入蒙版进行去噪处理。通过在扩散过程中明确建模插入蒙版,MADD有效地促进了认知能力的概念。大量实验结果表明,我们的方法优于最先进的方法,并在野外图像上展现出强大的泛化性能。请参阅我们的代码,网址为https://github.com/KaKituken/affordance-aware-any。
English
As a common image editing operation, image composition involves integrating
foreground objects into background scenes. In this paper, we expand the
application of the concept of Affordance from human-centered image composition
tasks to a more general object-scene composition framework, addressing the
complex interplay between foreground objects and background scenes. Following
the principle of Affordance, we define the affordance-aware object insertion
task, which aims to seamlessly insert any object into any scene with various
position prompts. To address the limited data issue and incorporate this task,
we constructed the SAM-FB dataset, which contains over 3 million examples
across more than 3,000 object categories. Furthermore, we propose the
Mask-Aware Dual Diffusion (MADD) model, which utilizes a dual-stream
architecture to simultaneously denoise the RGB image and the insertion mask. By
explicitly modeling the insertion mask in the diffusion process, MADD
effectively facilitates the notion of affordance. Extensive experimental
results show that our method outperforms the state-of-the-art methods and
exhibits strong generalization performance on in-the-wild images. Please refer
to our code on https://github.com/KaKituken/affordance-aware-any.Summary
AI-Generated Summary