魔鏡:在視頻擴散變壓器中保留ID的視頻生成
Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers
January 7, 2025
作者: Yuechen Zhang, Yaoyang Liu, Bin Xia, Bohao Peng, Zexin Yan, Eric Lo, Jiaya Jia
cs.AI
摘要
我們提出了Magic Mirror,一個用於生成保持身份的影片的框架,具有電影級質量和動態運動。儘管最近在視頻擴散模型方面取得了顯著進展,展示了在文本到視頻生成方面令人印象深刻的能力,但在生成自然運動的同時保持一致的身份仍然具有挑戰性。先前的方法要求進行特定人物的微調,或者在保持身份一致性與運動多樣性之間掙扎。基於Video Diffusion Transformers,我們的方法引入了三個關鍵組件:(1) 雙分支面部特徵提取器,捕捉身份和結構特徵,(2) 帶有Conditioned Adaptive Normalization的輕量級跨模態適配器,用於高效整合身份,以及(3) 將合成身份對與視頻數據相結合的兩階段訓練策略。大量實驗表明,Magic Mirror有效地平衡了身份一致性和自然運動,優於現有方法在多個指標上,同時需要添加最少的參數。代碼和模型將公開提供在:https://github.com/dvlab-research/MagicMirror/
English
We present Magic Mirror, a framework for generating identity-preserved videos
with cinematic-level quality and dynamic motion. While recent advances in video
diffusion models have shown impressive capabilities in text-to-video
generation, maintaining consistent identity while producing natural motion
remains challenging. Previous methods either require person-specific
fine-tuning or struggle to balance identity preservation with motion diversity.
Built upon Video Diffusion Transformers, our method introduces three key
components: (1) a dual-branch facial feature extractor that captures both
identity and structural features, (2) a lightweight cross-modal adapter with
Conditioned Adaptive Normalization for efficient identity integration, and (3)
a two-stage training strategy combining synthetic identity pairs with video
data. Extensive experiments demonstrate that Magic Mirror effectively balances
identity consistency with natural motion, outperforming existing methods across
multiple metrics while requiring minimal parameters added. The code and model
will be made publicly available at:
https://github.com/dvlab-research/MagicMirror/Summary
AI-Generated Summary