基於統一潛在擴散模型的無需調參圖像編輯:兼顧保真度與可編輯性
Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model
April 8, 2025
作者: Qi Mao, Lan Chen, Yuchao Gu, Mike Zheng Shou, Ming-Hsuan Yang
cs.AI
摘要
在基於文本的圖像編輯(TIE)中,平衡保真度與可編輯性至關重要,其中失敗通常會導致過度或不足的編輯問題。現有方法通常依賴於注意力注入來保持結構,並利用預訓練文本到圖像(T2I)模型的固有文本對齊能力來實現可編輯性,但它們缺乏明確且統一的機制來適當平衡這兩個目標。在本研究中,我們提出了UnifyEdit,這是一種無需調參的方法,通過擴散潛在優化來實現保真度與可編輯性在統一框架內的平衡整合。與直接注入注意力不同,我們開發了兩種基於注意力的約束:用於結構保真度的自注意力(SA)保留約束,以及用於增強文本對齊以提升可編輯性的交叉注意力(CA)對齊約束。然而,同時應用這兩種約束可能會導致梯度衝突,其中一種約束的優勢會導致過度或不足的編輯。為應對這一挑戰,我們引入了一種自適應時間步調度器,動態調整這些約束的影響,引導擴散潛在向最佳平衡邁進。大量的定量與定性實驗驗證了我們方法的有效性,展示了其在各種編輯任務中實現結構保留與文本對齊之間穩健平衡的優越性,超越了其他最先進的方法。源代碼將在https://github.com/CUC-MIPG/UnifyEdit 提供。
English
Balancing fidelity and editability is essential in text-based image editing
(TIE), where failures commonly lead to over- or under-editing issues. Existing
methods typically rely on attention injections for structure preservation and
leverage the inherent text alignment capabilities of pre-trained text-to-image
(T2I) models for editability, but they lack explicit and unified mechanisms to
properly balance these two objectives. In this work, we introduce UnifyEdit, a
tuning-free method that performs diffusion latent optimization to enable a
balanced integration of fidelity and editability within a unified framework.
Unlike direct attention injections, we develop two attention-based constraints:
a self-attention (SA) preservation constraint for structural fidelity, and a
cross-attention (CA) alignment constraint to enhance text alignment for
improved editability. However, simultaneously applying both constraints can
lead to gradient conflicts, where the dominance of one constraint results in
over- or under-editing. To address this challenge, we introduce an adaptive
time-step scheduler that dynamically adjusts the influence of these
constraints, guiding the diffusion latent toward an optimal balance. Extensive
quantitative and qualitative experiments validate the effectiveness of our
approach, demonstrating its superiority in achieving a robust balance between
structure preservation and text alignment across various editing tasks,
outperforming other state-of-the-art methods. The source code will be available
at https://github.com/CUC-MIPG/UnifyEdit.Summary
AI-Generated Summary