ChatPaper.aiChatPaper

免费使用扩散变换器实现万物个性化

Personalize Anything for Free with Diffusion Transformer

March 16, 2025
作者: Haoran Feng, Zehuan Huang, Lin Li, Hairong Lv, Lu Sheng
cs.AI

摘要

个性化图像生成旨在根据用户指定的概念生成图像,同时实现灵活的编辑功能。近期提出的免训练方法虽然展现出比基于训练的方法更高的计算效率,但在身份保持、适用性以及与扩散变换器(DiTs)的兼容性方面仍面临挑战。本文揭示了DiT尚未开发的潜力,即仅需将去噪标记替换为参考主体的标记,即可实现零样本主体重建。这一简单却有效的特征注入技术解锁了从个性化到图像编辑的多样化应用场景。基于这一发现,我们提出了“Personalize Anything”框架,这是一个免训练的系统,通过以下方式在DiT中实现个性化图像生成:1)时间步自适应标记替换,通过早期阶段注入确保主体一致性,并通过后期阶段正则化增强灵活性;2)采用补丁扰动策略以提升结构多样性。我们的方法无缝支持布局引导生成、多主体个性化及掩码控制编辑。评估结果表明,该方法在身份保持和多功能性方面达到了业界领先水平。本研究不仅为DiTs提供了新的见解,还构建了一个高效个性化的实用范式。
English
Personalized image generation aims to produce images of user-specified concepts while enabling flexible editing. Recent training-free approaches, while exhibit higher computational efficiency than training-based methods, struggle with identity preservation, applicability, and compatibility with diffusion transformers (DiTs). In this paper, we uncover the untapped potential of DiT, where simply replacing denoising tokens with those of a reference subject achieves zero-shot subject reconstruction. This simple yet effective feature injection technique unlocks diverse scenarios, from personalization to image editing. Building upon this observation, we propose Personalize Anything, a training-free framework that achieves personalized image generation in DiT through: 1) timestep-adaptive token replacement that enforces subject consistency via early-stage injection and enhances flexibility through late-stage regularization, and 2) patch perturbation strategies to boost structural diversity. Our method seamlessly supports layout-guided generation, multi-subject personalization, and mask-controlled editing. Evaluations demonstrate state-of-the-art performance in identity preservation and versatility. Our work establishes new insights into DiTs while delivering a practical paradigm for efficient personalization.

Summary

AI-Generated Summary

PDF445March 18, 2025