STAR:利用文本到视频模型进行空间-时间增强的真实世界视频超分辨率
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
January 6, 2025
作者: Rui Xie, Yinhong Liu, Penghao Zhou, Chen Zhao, Jun Zhou, Kai Zhang, Zhenyu Zhang, Jian Yang, Zhenheng Yang, Ying Tai
cs.AI
摘要
图像扩散模型已经被调整用于实际视频超分辨率,以解决基于GAN方法的过度平滑问题。然而,这些模型在保持时间一致性方面存在困难,因为它们是在静态图像上训练的,限制了其有效捕捉时间动态的能力。将文本到视频(T2V)模型整合到视频超分辨率中以改善时间建模是直接的。然而,仍然存在两个关键挑战:在实际场景中引入的复杂退化引入的伪影,以及由于强大的T2V模型(例如CogVideoX-5B)的强大生成能力而导致的保真度受损。为了增强恢复视频的时空质量,我们介绍了\name(用于实际视频超分辨率的T2V模型的时空增强),这是一种利用T2V模型进行实际视频超分辨率的新方法,实现了逼真的空间细节和稳健的时间一致性。具体而言,我们在全局注意力块之前引入了局部信息增强模块(LIEM),以丰富局部细节并减轻退化伪影。此外,我们提出了动态频率(DF)损失来加强保真度,引导模型在扩散步骤中专注于不同频率成分。大量实验证明\name 在合成和实际数据集上均优于最先进的方法。
English
Image diffusion models have been adapted for real-world video
super-resolution to tackle over-smoothing issues in GAN-based methods. However,
these models struggle to maintain temporal consistency, as they are trained on
static images, limiting their ability to capture temporal dynamics effectively.
Integrating text-to-video (T2V) models into video super-resolution for improved
temporal modeling is straightforward. However, two key challenges remain:
artifacts introduced by complex degradations in real-world scenarios, and
compromised fidelity due to the strong generative capacity of powerful T2V
models (e.g., CogVideoX-5B). To enhance the spatio-temporal quality of
restored videos, we introduce~\name
(Spatial-Temporal Augmentation with T2V models for
Real-world video super-resolution), a novel approach that leverages
T2V models for real-world video super-resolution, achieving realistic spatial
details and robust temporal consistency. Specifically, we introduce a Local
Information Enhancement Module (LIEM) before the global attention block to
enrich local details and mitigate degradation artifacts. Moreover, we propose a
Dynamic Frequency (DF) Loss to reinforce fidelity, guiding the model to focus
on different frequency components across diffusion steps. Extensive experiments
demonstrate~\name~outperforms state-of-the-art methods on both
synthetic and real-world datasets.Summary
AI-Generated Summary