UIP2P:透過循環編輯一致性的無監督指導式圖像編輯

UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency

December 19, 2024
作者: Enis Simsar, Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari
cs.AI

摘要

我們提出了一種用於基於指示的圖像編輯的非監督模型,它在訓練過程中消除了對地面真實編輯圖像的需求。現有的監督方法依賴包含輸入圖像、編輯圖像和編輯指示三元組的數據集。這些三元組是由現有的編輯方法或人工標註生成的,這會引入偏見並限制它們的泛化能力。我們的方法通過引入一種名為循環編輯一致性(CEC)的新編輯機制來應對這些挑戰,該機制在一個訓練步驟中應用正向和反向編輯,並在圖像和注意力空間中強制實現一致性。這使我們能夠繞過對地面真實編輯圖像的需求,並首次在包含真實圖像-標題對或圖像-標題-編輯三元組的數據集上進行訓練。我們通過實驗證明,我們的非監督技術在更廣泛範圍的編輯中表現更好,具有高度忠實度和精確度。通過消除對三元組預先存在數據集的需求,減少與監督方法相關的偏見,並提出CEC,我們的工作代表了在解鎖基於指示的圖像編輯的擴展方面的重大進步。
English
We propose an unsupervised model for instruction-based image editing that eliminates the need for ground-truth edited images during training. Existing supervised methods depend on datasets containing triplets of input image, edited image, and edit instruction. These are generated by either existing editing methods or human-annotations, which introduce biases and limit their generalization ability. Our method addresses these challenges by introducing a novel editing mechanism called Cycle Edit Consistency (CEC), which applies forward and backward edits in one training step and enforces consistency in image and attention spaces. This allows us to bypass the need for ground-truth edited images and unlock training for the first time on datasets comprising either real image-caption pairs or image-caption-edit triplets. We empirically show that our unsupervised technique performs better across a broader range of edits with high fidelity and precision. By eliminating the need for pre-existing datasets of triplets, reducing biases associated with supervised methods, and proposing CEC, our work represents a significant advancement in unblocking scaling of instruction-based image editing.

Summary

AI-Generated Summary

PDF53December 20, 2024