ChatPaper.aiChatPaper

概念柳叶刀:基于组合表征的图像编辑与移植

Concept Lancet: Image Editing with Compositional Representation Transplant

April 3, 2025
作者: Jinqi Luo, Tianjiao Ding, Kwan Ho Ryan Chan, Hancheng Min, Chris Callison-Burch, René Vidal
cs.AI

摘要

扩散模型广泛应用于图像编辑任务。现有编辑方法通常通过在文本嵌入或得分空间中精心设计编辑方向来构建表示操作流程。然而,此类方法面临一个关键挑战:过高估计编辑强度会损害视觉一致性,而低估则无法完成编辑任务。值得注意的是,每幅源图像可能需要不同的编辑强度,通过反复试验寻找合适的强度成本高昂。为解决这一难题,我们提出了概念手术刀(CoLan),一种零样本即插即用的框架,用于在基于扩散的图像编辑中进行原则性的表示操作。在推理阶段,我们将源输入在潜在(文本嵌入或扩散得分)空间分解为收集到的视觉概念表示的稀疏线性组合。这使得我们能够准确估计每幅图像中概念的存在情况,从而指导编辑。根据编辑任务(替换/添加/移除),我们执行定制的概念移植过程以施加相应的编辑方向。为充分建模概念空间,我们构建了一个概念表示数据集CoLan-150K,其中包含视觉术语和短语的多样化描述和场景,作为潜在字典。在多个基于扩散的图像编辑基准上的实验表明,配备CoLan的方法在编辑效果和一致性保持方面均达到了最先进的性能。
English
Diffusion models are widely used for image editing tasks. Existing editing methods often design a representation manipulation procedure by curating an edit direction in the text embedding or score space. However, such a procedure faces a key challenge: overestimating the edit strength harms visual consistency while underestimating it fails the editing task. Notably, each source image may require a different editing strength, and it is costly to search for an appropriate strength via trial-and-error. To address this challenge, we propose Concept Lancet (CoLan), a zero-shot plug-and-play framework for principled representation manipulation in diffusion-based image editing. At inference time, we decompose the source input in the latent (text embedding or diffusion score) space as a sparse linear combination of the representations of the collected visual concepts. This allows us to accurately estimate the presence of concepts in each image, which informs the edit. Based on the editing task (replace/add/remove), we perform a customized concept transplant process to impose the corresponding editing direction. To sufficiently model the concept space, we curate a conceptual representation dataset, CoLan-150K, which contains diverse descriptions and scenarios of visual terms and phrases for the latent dictionary. Experiments on multiple diffusion-based image editing baselines show that methods equipped with CoLan achieve state-of-the-art performance in editing effectiveness and consistency preservation.

Summary

AI-Generated Summary

PDF173April 8, 2025