魔镜:在视频扩散变压器中保留ID的视频生成
Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers
January 7, 2025
作者: Yuechen Zhang, Yaoyang Liu, Bin Xia, Bohao Peng, Zexin Yan, Eric Lo, Jiaya Jia
cs.AI
摘要
我们提出了Magic Mirror,这是一个用于生成保留身份的视频的框架,具有电影级质量和动态运动。虽然最近视频扩散模型在文本到视频生成方面展现出令人印象深刻的能力,但在产生自然运动的同时保持一致的身份仍然具有挑战性。先前的方法要么需要特定个人的微调,要么在保持身份一致性与运动多样性之间难以平衡。基于视频扩散Transformer,我们的方法引入了三个关键组件:(1) 一个双分支面部特征提取器,捕捉身份和结构特征,(2) 一个轻量级的跨模态适配器,具有条件自适应归一化,用于高效整合身份,以及(3) 一个结合了合成身份对和视频数据的两阶段训练策略。大量实验证明,Magic Mirror有效地平衡了身份一致性和自然运动,在多个指标上优于现有方法,同时需要添加的参数很少。代码和模型将在以下网址公开提供:https://github.com/dvlab-research/MagicMirror/
English
We present Magic Mirror, a framework for generating identity-preserved videos
with cinematic-level quality and dynamic motion. While recent advances in video
diffusion models have shown impressive capabilities in text-to-video
generation, maintaining consistent identity while producing natural motion
remains challenging. Previous methods either require person-specific
fine-tuning or struggle to balance identity preservation with motion diversity.
Built upon Video Diffusion Transformers, our method introduces three key
components: (1) a dual-branch facial feature extractor that captures both
identity and structural features, (2) a lightweight cross-modal adapter with
Conditioned Adaptive Normalization for efficient identity integration, and (3)
a two-stage training strategy combining synthetic identity pairs with video
data. Extensive experiments demonstrate that Magic Mirror effectively balances
identity consistency with natural motion, outperforming existing methods across
multiple metrics while requiring minimal parameters added. The code and model
will be made publicly available at:
https://github.com/dvlab-research/MagicMirror/Summary
AI-Generated Summary