使用扩散变压器进行视频动作转移。
Video Motion Transfer with Diffusion Transformers
December 10, 2024
作者: Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov, Philip Torr, Fabio Pizzati
cs.AI
摘要
我们提出了DiTFlow,这是一种用于将参考视频的运动转移到新合成视频的方法,专门为扩散变压器(DiT)设计。我们首先使用预训练的DiT处理参考视频,分析跨帧注意力图并提取一种称为注意力运动流(AMF)的分块运动信号。我们通过优化latents与我们的AMF损失来引导潜在的去噪过程,以无需训练的方式生成重现参考视频运动的视频。我们还将我们的优化策略应用于变压器位置嵌入,使我们在零样本运动转移能力上获得提升。我们对DiTFlow进行评估,与最近发表的方法进行比较,在多个指标和人类评估中表现优异。
English
We propose DiTFlow, a method for transferring the motion of a reference video
to a newly synthesized one, designed specifically for Diffusion Transformers
(DiT). We first process the reference video with a pre-trained DiT to analyze
cross-frame attention maps and extract a patch-wise motion signal called the
Attention Motion Flow (AMF). We guide the latent denoising process in an
optimization-based, training-free, manner by optimizing latents with our AMF
loss to generate videos reproducing the motion of the reference one. We also
apply our optimization strategy to transformer positional embeddings, granting
us a boost in zero-shot motion transfer capabilities. We evaluate DiTFlow
against recently published methods, outperforming all across multiple metrics
and human evaluation.Summary
AI-Generated Summary