UIP2P:基于无监督指导的图像编辑,通过循环编辑保持一致性。
UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency
December 19, 2024
作者: Enis Simsar, Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari
cs.AI
摘要
我们提出了一种用于基于指令的图像编辑的无监督模型,该模型在训练过程中消除了对地面真实编辑图像的需求。现有的监督方法依赖包含输入图像、编辑图像和编辑指令三元组的数据集。这些数据集由现有的编辑方法或人工注释生成,这会引入偏见并限制它们的泛化能力。我们的方法通过引入一种名为循环编辑一致性(CEC)的新颖编辑机制来解决这些挑战,该机制在一个训练步骤中应用前向和后向编辑,并在图像和注意力空间中强制保持一致性。这使我们能够绕过对地面真实编辑图像的需求,并首次在包含真实图像-标题对或图像-标题-编辑三元组的数据集上进行训练。我们凭经验证明,我们的无监督技术在更广泛范围的编辑中表现更好,具有高保真度和精度。通过消除三元组的预先存在数据集的需求,减少与监督方法相关的偏见,并提出CEC,我们的工作代表了在解锁基于指令的图像编辑的扩展方面的重大进展。
English
We propose an unsupervised model for instruction-based image editing that
eliminates the need for ground-truth edited images during training. Existing
supervised methods depend on datasets containing triplets of input image,
edited image, and edit instruction. These are generated by either existing
editing methods or human-annotations, which introduce biases and limit their
generalization ability. Our method addresses these challenges by introducing a
novel editing mechanism called Cycle Edit Consistency (CEC), which applies
forward and backward edits in one training step and enforces consistency in
image and attention spaces. This allows us to bypass the need for ground-truth
edited images and unlock training for the first time on datasets comprising
either real image-caption pairs or image-caption-edit triplets. We empirically
show that our unsupervised technique performs better across a broader range of
edits with high fidelity and precision. By eliminating the need for
pre-existing datasets of triplets, reducing biases associated with supervised
methods, and proposing CEC, our work represents a significant advancement in
unblocking scaling of instruction-based image editing.Summary
AI-Generated Summary