正本清源:自动引导去噪轨迹,规避不良概念
Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts
April 17, 2025
作者: Leyang Li, Shilin Lu, Yan Ren, Adams Wai-Kin Kong
cs.AI
摘要
确保文本到图像模型的伦理部署,需要有效技术来防止生成有害或不适当内容。尽管概念擦除方法提供了有前景的解决方案,现有的基于微调的方法仍存在显著局限。无锚点方法可能扰乱采样轨迹,导致视觉伪影,而有锚点方法则依赖于启发式选择的锚点概念。为克服这些不足,我们引入了一种名为ANT的微调框架,它能自动引导去噪轨迹避开不期望的概念。ANT基于一个关键洞见:在去噪中后期阶段反转无分类器引导的条件方向,可在不牺牲早期结构完整性的前提下实现精确内容修改。这启发了一种轨迹感知的目标函数,它保护了早期得分函数场的完整性,引导样本趋向自然图像流形,而无需依赖启发式锚点概念选择。对于单概念擦除,我们提出了一种增强权重显著图,以精确识别对不期望概念贡献最大的关键参数,从而实现更彻底且高效的擦除。对于多概念擦除,我们的目标函数提供了一个灵活即插即用的解决方案,显著提升了性能。大量实验证明,ANT在单概念和多概念擦除上均达到了最先进水平,在保持生成保真度的同时,输出了高质量、安全的内容。代码可在https://github.com/lileyang1210/ANT获取。
English
Ensuring the ethical deployment of text-to-image models requires effective
techniques to prevent the generation of harmful or inappropriate content. While
concept erasure methods offer a promising solution, existing finetuning-based
approaches suffer from notable limitations. Anchor-free methods risk disrupting
sampling trajectories, leading to visual artifacts, while anchor-based methods
rely on the heuristic selection of anchor concepts. To overcome these
shortcomings, we introduce a finetuning framework, dubbed ANT, which
Automatically guides deNoising Trajectories to avoid unwanted concepts. ANT is
built on a key insight: reversing the condition direction of classifier-free
guidance during mid-to-late denoising stages enables precise content
modification without sacrificing early-stage structural integrity. This
inspires a trajectory-aware objective that preserves the integrity of the
early-stage score function field, which steers samples toward the natural image
manifold, without relying on heuristic anchor concept selection. For
single-concept erasure, we propose an augmentation-enhanced weight saliency map
to precisely identify the critical parameters that most significantly
contribute to the unwanted concept, enabling more thorough and efficient
erasure. For multi-concept erasure, our objective function offers a versatile
plug-and-play solution that significantly boosts performance. Extensive
experiments demonstrate that ANT achieves state-of-the-art results in both
single and multi-concept erasure, delivering high-quality, safe outputs without
compromising the generative fidelity. Code is available at
https://github.com/lileyang1210/ANTSummary
AI-Generated Summary