使用擴散Transformer進行視頻動作轉移
Video Motion Transfer with Diffusion Transformers
December 10, 2024
作者: Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov, Philip Torr, Fabio Pizzati
cs.AI
摘要
我們提出了 DiTFlow,一種用於將參考影片的動作轉移到新合成影片的方法,專門設計用於擴散Transformer(DiT)。我們首先使用預先訓練的DiT處理參考影片,以分析跨幀注意力地圖並提取稱為注意力運動流(AMF)的分區式運動信號。我們通過優化潛在變量與我們的AMF損失來引導基於優化的無需訓練的潛在去噪過程,以生成重現參考影片動作的影片。我們還將我們的優化策略應用於Transformer位置嵌入,使我們在零樣本運動轉移能力上獲得提升。我們通過多個指標和人類評估對DiTFlow進行評估,優於最近發表的方法。
English
We propose DiTFlow, a method for transferring the motion of a reference video
to a newly synthesized one, designed specifically for Diffusion Transformers
(DiT). We first process the reference video with a pre-trained DiT to analyze
cross-frame attention maps and extract a patch-wise motion signal called the
Attention Motion Flow (AMF). We guide the latent denoising process in an
optimization-based, training-free, manner by optimizing latents with our AMF
loss to generate videos reproducing the motion of the reference one. We also
apply our optimization strategy to transformer positional embeddings, granting
us a boost in zero-shot motion transfer capabilities. We evaluate DiTFlow
against recently published methods, outperforming all across multiple metrics
and human evaluation.Summary
AI-Generated Summary