ChatPaper.aiChatPaper

概念柳葉刀:基於組合表徵的圖像編輯 移植

Concept Lancet: Image Editing with Compositional Representation Transplant

April 3, 2025
作者: Jinqi Luo, Tianjiao Ding, Kwan Ho Ryan Chan, Hancheng Min, Chris Callison-Burch, René Vidal
cs.AI

摘要

擴散模型在圖像編輯任務中得到了廣泛應用。現有的編輯方法通常通過在文本嵌入或分數空間中設計一個表示操作程序來進行編輯。然而,這種方法面臨一個關鍵挑戰:過高估計編輯強度會損害視覺一致性,而過低估計則無法完成編輯任務。值得注意的是,每個源圖像可能需要不同的編輯強度,而通過試錯法來尋找合適的強度成本高昂。為了解決這一挑戰,我們提出了概念手術刀(Concept Lancet, CoLan),這是一個零樣本即插即用的框架,用於在基於擴散的圖像編輯中進行有原則的表示操作。在推理時,我們將源輸入在潛在(文本嵌入或擴散分數)空間中分解為收集到的視覺概念表示的稀疏線性組合。這使我們能夠準確估計每個圖像中概念的存在,從而指導編輯。根據編輯任務(替換/添加/移除),我們執行定制的概念移植過程,以施加相應的編輯方向。為了充分建模概念空間,我們策劃了一個概念表示數據集CoLan-150K,其中包含了多樣化的視覺術語和短語的描述和場景,用於潛在字典。在多個基於擴散的圖像編輯基線上的實驗表明,配備了CoLan的方法在編輯效果和一致性保持方面達到了最先進的性能。
English
Diffusion models are widely used for image editing tasks. Existing editing methods often design a representation manipulation procedure by curating an edit direction in the text embedding or score space. However, such a procedure faces a key challenge: overestimating the edit strength harms visual consistency while underestimating it fails the editing task. Notably, each source image may require a different editing strength, and it is costly to search for an appropriate strength via trial-and-error. To address this challenge, we propose Concept Lancet (CoLan), a zero-shot plug-and-play framework for principled representation manipulation in diffusion-based image editing. At inference time, we decompose the source input in the latent (text embedding or diffusion score) space as a sparse linear combination of the representations of the collected visual concepts. This allows us to accurately estimate the presence of concepts in each image, which informs the edit. Based on the editing task (replace/add/remove), we perform a customized concept transplant process to impose the corresponding editing direction. To sufficiently model the concept space, we curate a conceptual representation dataset, CoLan-150K, which contains diverse descriptions and scenarios of visual terms and phrases for the latent dictionary. Experiments on multiple diffusion-based image editing baselines show that methods equipped with CoLan achieve state-of-the-art performance in editing effectiveness and consistency preservation.

Summary

AI-Generated Summary

PDF163April 8, 2025