ChatPaper.aiChatPaper

RIFLEx:视频扩散Transformer中长度外推的免费午餐

RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers

February 21, 2025
作者: Min Zhao, Guande He, Yixiao Chen, Hongzhou Zhu, Chongxuan Li, Jun Zhu
cs.AI

摘要

近期视频生成技术的进步使得模型能够合成高质量、长达一分钟的视频。然而,生成更长且时间连贯的视频仍是一大挑战,现有的长度外推方法往往导致时间上的重复或运动减速。本研究系统分析了位置编码中频率成分的作用,并识别出一个主要控制外推行为的内在频率。基于这一发现,我们提出了RIFLEx,一种简洁而有效的方法,通过降低内在频率来抑制重复,同时保持运动一致性,且无需任何额外修改。RIFLEx提供了一种真正的“免费午餐”——在无需训练的情况下,于最先进的视频扩散变换器上实现了高质量的2倍外推。此外,通过少量微调,无需长视频,它还能提升质量并实现3倍外推。项目页面及代码详见:https://riflex-video.github.io/。
English
Recent advancements in video generation have enabled models to synthesize high-quality, minute-long videos. However, generating even longer videos with temporal coherence remains a major challenge, and existing length extrapolation methods lead to temporal repetition or motion deceleration. In this work, we systematically analyze the role of frequency components in positional embeddings and identify an intrinsic frequency that primarily governs extrapolation behavior. Based on this insight, we propose RIFLEx, a minimal yet effective approach that reduces the intrinsic frequency to suppress repetition while preserving motion consistency, without requiring any additional modifications. RIFLEx offers a true free lunch--achieving high-quality 2times extrapolation on state-of-the-art video diffusion transformers in a completely training-free manner. Moreover, it enhances quality and enables 3times extrapolation by minimal fine-tuning without long videos. Project page and codes: https://riflex-video.github.io/{https://riflex-video.github.io/.}

Summary

AI-Generated Summary

PDF203February 25, 2025