正本清源:自動導航去噪軌跡以規避不良概念
Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts
April 17, 2025
作者: Leyang Li, Shilin Lu, Yan Ren, Adams Wai-Kin Kong
cs.AI
摘要
確保文本到圖像模型的倫理部署,需要有效技術來防止生成有害或不當內容。雖然概念擦除方法提供了一個有前景的解決方案,但現有的基於微調的方法存在顯著限制。無錨方法可能破壞採樣軌跡,導致視覺偽影,而有錨方法則依賴於啟發式選擇的錨概念。為克服這些不足,我們引入了一個名為ANT的微調框架,它能自動引導去噪軌跡以避免不需要的概念。ANT基於一個關鍵洞察:在去噪的中後期階段反轉無分類器引導的條件方向,可以在不犧牲早期結構完整性的前提下實現精確的內容修改。這啟發了一種軌跡感知的目標函數,它保持了早期階段分數函數場的完整性,引導樣本向自然圖像流形靠近,而無需依賴啟發式的錨概念選擇。對於單一概念擦除,我們提出了一種增強型權重顯著性圖,以精確識別對不需要概念貢獻最大的關鍵參數,從而實現更徹底和高效的擦除。對於多概念擦除,我們的目標函數提供了一個靈活的即插即用解決方案,顯著提升了性能。大量實驗表明,ANT在單一和多概念擦除方面均達到了最先進的成果,在不損害生成保真度的前提下,輸出了高質量且安全的結果。代碼可在https://github.com/lileyang1210/ANT 獲取。
English
Ensuring the ethical deployment of text-to-image models requires effective
techniques to prevent the generation of harmful or inappropriate content. While
concept erasure methods offer a promising solution, existing finetuning-based
approaches suffer from notable limitations. Anchor-free methods risk disrupting
sampling trajectories, leading to visual artifacts, while anchor-based methods
rely on the heuristic selection of anchor concepts. To overcome these
shortcomings, we introduce a finetuning framework, dubbed ANT, which
Automatically guides deNoising Trajectories to avoid unwanted concepts. ANT is
built on a key insight: reversing the condition direction of classifier-free
guidance during mid-to-late denoising stages enables precise content
modification without sacrificing early-stage structural integrity. This
inspires a trajectory-aware objective that preserves the integrity of the
early-stage score function field, which steers samples toward the natural image
manifold, without relying on heuristic anchor concept selection. For
single-concept erasure, we propose an augmentation-enhanced weight saliency map
to precisely identify the critical parameters that most significantly
contribute to the unwanted concept, enabling more thorough and efficient
erasure. For multi-concept erasure, our objective function offers a versatile
plug-and-play solution that significantly boosts performance. Extensive
experiments demonstrate that ANT achieves state-of-the-art results in both
single and multi-concept erasure, delivering high-quality, safe outputs without
compromising the generative fidelity. Code is available at
https://github.com/lileyang1210/ANTSummary
AI-Generated Summary