DreamID：基於三重ID群組學習的高保真快速擴散式人臉交換技術

摘要

本文介紹了DreamID，這是一種基於擴散模型的人臉交換技術，能夠實現高度的身份相似性、屬性保留、圖像保真度以及快速的推理速度。與通常依賴隱式監督且難以達到滿意效果的人臉交換訓練過程不同，DreamID通過構建三重身份組數據，為人臉交換建立了顯式監督，顯著提升了身份相似性和屬性保留。擴散模型的迭代特性對利用高效的圖像空間損失函數提出了挑戰，因為在訓練過程中進行耗時的多步採樣以獲取生成圖像是不切實際的。為解決這一問題，我們採用了加速擴散模型SD Turbo，將推理步驟減少至單次迭代，從而實現了基於顯式三重身份組監督的高效像素級端到端訓練。此外，我們提出了一種改進的基於擴散模型的架構，包括SwapNet、FaceNet和ID Adapter。這一強大架構充分釋放了三重身份組顯式監督的潛力。最後，為了進一步擴展我們的方法，我們在訓練期間顯式修改三重身份組數據，以微調和保留特定屬性，如眼鏡和臉型。大量實驗表明，DreamID在身份相似性、姿態和表情保留以及圖像保真度方面均優於現有最先進的方法。總體而言，DreamID在512*512分辨率下僅需0.6秒即可實現高質量的人臉交換效果，並在複雜光照、大角度和遮擋等挑戰性場景中表現尤為出色。

English

In this paper, we introduce DreamID, a diffusion-based face swapping model that achieves high levels of ID similarity, attribute preservation, image fidelity, and fast inference speed. Unlike the typical face swapping training process, which often relies on implicit supervision and struggles to achieve satisfactory results. DreamID establishes explicit supervision for face swapping by constructing Triplet ID Group data, significantly enhancing identity similarity and attribute preservation. The iterative nature of diffusion models poses challenges for utilizing efficient image-space loss functions, as performing time-consuming multi-step sampling to obtain the generated image during training is impractical. To address this issue, we leverage the accelerated diffusion model SD Turbo, reducing the inference steps to a single iteration, enabling efficient pixel-level end-to-end training with explicit Triplet ID Group supervision. Additionally, we propose an improved diffusion-based model architecture comprising SwapNet, FaceNet, and ID Adapter. This robust architecture fully unlocks the power of the Triplet ID Group explicit supervision. Finally, to further extend our method, we explicitly modify the Triplet ID Group data during training to fine-tune and preserve specific attributes, such as glasses and face shape. Extensive experiments demonstrate that DreamID outperforms state-of-the-art methods in terms of identity similarity, pose and expression preservation, and image fidelity. Overall, DreamID achieves high-quality face swapping results at 512*512 resolution in just 0.6 seconds and performs exceptionally well in challenging scenarios such as complex lighting, large angles, and occlusions.

DreamID：基於三重ID群組學習的高保真快速擴散式人臉交換技術

DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning

摘要

Summary

Support

Support