DreamID:通过三重ID组学习实现高保真快速扩散模型人脸交换
DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning
April 20, 2025
作者: Fulong Ye, Miao Hua, Pengze Zhang, Xinghui Li, Qichao Sun, Songtao Zhao, Qian He, Xinglong Wu
cs.AI
摘要
本文介绍了DreamID,一种基于扩散模型的人脸交换技术,该技术在身份相似度、属性保留、图像保真度及快速推理速度方面均达到了高水平。与通常依赖隐式监督且难以取得满意效果的传统人脸交换训练过程不同,DreamID通过构建三元组身份群组数据,为人脸交换建立了显式监督机制,显著提升了身份相似性和属性保留度。扩散模型的迭代特性给利用高效的图像空间损失函数带来了挑战,因为在训练过程中进行耗时的多步采样以获取生成图像是不现实的。为解决这一问题,我们采用了加速扩散模型SD Turbo,将推理步骤缩减至单次迭代,从而实现了结合显式三元组身份群组监督的高效像素级端到端训练。此外,我们提出了一种改进的基于扩散的模型架构,包括SwapNet、FaceNet和ID Adapter,这一强大架构充分发挥了三元组身份群组显式监督的潜力。最后,为扩展方法的应用范围,我们在训练期间明确调整三元组身份群组数据,以微调并保留特定属性,如眼镜和脸型。大量实验证明,DreamID在身份相似度、姿态与表情保留以及图像保真度方面均超越了现有最先进的方法。总体而言,DreamID在512*512分辨率下仅需0.6秒即可实现高质量的人脸交换效果,并在复杂光照、大角度及遮挡等挑战性场景中表现尤为出色。
English
In this paper, we introduce DreamID, a diffusion-based face swapping model
that achieves high levels of ID similarity, attribute preservation, image
fidelity, and fast inference speed. Unlike the typical face swapping training
process, which often relies on implicit supervision and struggles to achieve
satisfactory results. DreamID establishes explicit supervision for face
swapping by constructing Triplet ID Group data, significantly enhancing
identity similarity and attribute preservation. The iterative nature of
diffusion models poses challenges for utilizing efficient image-space loss
functions, as performing time-consuming multi-step sampling to obtain the
generated image during training is impractical. To address this issue, we
leverage the accelerated diffusion model SD Turbo, reducing the inference steps
to a single iteration, enabling efficient pixel-level end-to-end training with
explicit Triplet ID Group supervision. Additionally, we propose an improved
diffusion-based model architecture comprising SwapNet, FaceNet, and ID Adapter.
This robust architecture fully unlocks the power of the Triplet ID Group
explicit supervision. Finally, to further extend our method, we explicitly
modify the Triplet ID Group data during training to fine-tune and preserve
specific attributes, such as glasses and face shape. Extensive experiments
demonstrate that DreamID outperforms state-of-the-art methods in terms of
identity similarity, pose and expression preservation, and image fidelity.
Overall, DreamID achieves high-quality face swapping results at 512*512
resolution in just 0.6 seconds and performs exceptionally well in challenging
scenarios such as complex lighting, large angles, and occlusions.Summary
AI-Generated Summary