ObjectMate:一种用于对象插入和主体驱动生成的循环先验

ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation

December 11, 2024
作者: Daniel Winter, Asaf Shul, Matan Cohen, Dana Berman, Yael Pritch, Alex Rav-Acha, Yedid Hoshen
cs.AI

摘要

本文介绍了一种无需调参的方法,用于对象插入和主体驱动生成。该任务涉及将给定多个视角的对象合成到由图像或文本指定的场景中。现有方法难以完全满足任务的挑战性目标:(i) 将对象与场景无缝合成,具有逼真的姿势和光照,并且(ii) 保留对象的身份。我们假设要实现这些目标需要大规模监督,但手动收集足够的数据成本太高。本文的关键观察是,许多大规模生产的对象在大型未标记数据集的多个图像中反复出现,处于不同的场景、姿势和光照条件下。我们利用这一观察结果通过检索相同对象的多种视角集合来创建大规模监督。这个强大的配对数据集使我们能够训练一个直接的文本到图像扩散架构,将对象和场景描述映射到合成图像。我们将我们的方法ObjectMate与最先进的对象插入和主体驱动生成方法进行比较,使用单个或多个参考。实证结果显示,ObjectMate实现了更好的身份保留和更逼真的合成。与许多其他多参考方法不同,ObjectMate不需要在测试时进行缓慢的调参。
English
This paper introduces a tuning-free method for both object insertion and subject-driven generation. The task involves composing an object, given multiple views, into a scene specified by either an image or text. Existing methods struggle to fully meet the task's challenging objectives: (i) seamlessly composing the object into the scene with photorealistic pose and lighting, and (ii) preserving the object's identity. We hypothesize that achieving these goals requires large scale supervision, but manually collecting sufficient data is simply too expensive. The key observation in this paper is that many mass-produced objects recur across multiple images of large unlabeled datasets, in different scenes, poses, and lighting conditions. We use this observation to create massive supervision by retrieving sets of diverse views of the same object. This powerful paired dataset enables us to train a straightforward text-to-image diffusion architecture to map the object and scene descriptions to the composited image. We compare our method, ObjectMate, with state-of-the-art methods for object insertion and subject-driven generation, using a single or multiple references. Empirically, ObjectMate achieves superior identity preservation and more photorealistic composition. Differently from many other multi-reference methods, ObjectMate does not require slow test-time tuning.

Summary

AI-Generated Summary

PDF112December 16, 2024