Mobius:通过潜在空间转换实现文本到无缝循环视频的生成
Mobius: Text to Seamless Looping Video Generation via Latent Shift
February 27, 2025
作者: Xiuli Bi, Jianfei Yuan, Bo Liu, Yong Zhang, Xiaodong Cun, Chi-Man Pun, Bin Xiao
cs.AI
摘要
我们提出了Mobius,一种直接从文本描述生成无缝循环视频的新方法,无需任何用户标注,从而为多媒体展示创造新的视觉素材。我们的方法重新利用了预训练的视频潜在扩散模型,通过文本提示生成循环视频,而无需进行额外训练。在推理过程中,我们首先通过连接视频的起始和结束噪声构建一个潜在循环。鉴于视频扩散模型的上下文能够保持时间一致性,我们在每一步中逐步将首帧潜在特征移至末尾,进行多帧潜在去噪。因此,尽管每一步的去噪上下文有所不同,但在整个推理过程中仍能保持一致性。此外,我们方法中的潜在循环长度可任意设定,这扩展了潜在特征平移方法的应用范围,使其能够生成超出视频扩散模型上下文限制的无缝循环视频。与以往的动态影像不同,所提出的方法无需依赖图像作为外观,这通常会限制生成结果的动作。相反,我们的方法能够产生更具动态感的运动效果和更优的视觉质量。我们通过多项实验和对比验证了所提出方法的有效性,展示了其在不同场景下的优异表现。所有代码将公开提供。
English
We present Mobius, a novel method to generate seamlessly looping videos from
text descriptions directly without any user annotations, thereby creating new
visual materials for the multi-media presentation. Our method repurposes the
pre-trained video latent diffusion model for generating looping videos from
text prompts without any training. During inference, we first construct a
latent cycle by connecting the starting and ending noise of the videos. Given
that the temporal consistency can be maintained by the context of the video
diffusion model, we perform multi-frame latent denoising by gradually shifting
the first-frame latent to the end in each step. As a result, the denoising
context varies in each step while maintaining consistency throughout the
inference process. Moreover, the latent cycle in our method can be of any
length. This extends our latent-shifting approach to generate seamless looping
videos beyond the scope of the video diffusion model's context. Unlike previous
cinemagraphs, the proposed method does not require an image as appearance,
which will restrict the motions of the generated results. Instead, our method
can produce more dynamic motion and better visual quality. We conduct multiple
experiments and comparisons to verify the effectiveness of the proposed method,
demonstrating its efficacy in different scenarios. All the code will be made
available.Summary
AI-Generated Summary