STAR:利用文本到视频模型进行空间-时间增强的真实世界视频超分辨率

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

January 6, 2025
作者: Rui Xie, Yinhong Liu, Penghao Zhou, Chen Zhao, Jun Zhou, Kai Zhang, Zhenyu Zhang, Jian Yang, Zhenheng Yang, Ying Tai
cs.AI

摘要

图像扩散模型已经被调整用于实际视频超分辨率,以解决基于GAN方法的过度平滑问题。然而,这些模型在保持时间一致性方面存在困难,因为它们是在静态图像上训练的,限制了其有效捕捉时间动态的能力。将文本到视频(T2V)模型整合到视频超分辨率中以改善时间建模是直接的。然而,仍然存在两个关键挑战:在实际场景中引入的复杂退化引入的伪影,以及由于强大的T2V模型(例如CogVideoX-5B)的强大生成能力而导致的保真度受损。为了增强恢复视频的时空质量,我们介绍了\name(用于实际视频超分辨率的T2V模型的时空增强),这是一种利用T2V模型进行实际视频超分辨率的新方法,实现了逼真的空间细节和稳健的时间一致性。具体而言,我们在全局注意力块之前引入了局部信息增强模块(LIEM),以丰富局部细节并减轻退化伪影。此外,我们提出了动态频率(DF)损失来加强保真度,引导模型在扩散步骤中专注于不同频率成分。大量实验证明\name 在合成和实际数据集上均优于最先进的方法。
English
Image diffusion models have been adapted for real-world video super-resolution to tackle over-smoothing issues in GAN-based methods. However, these models struggle to maintain temporal consistency, as they are trained on static images, limiting their ability to capture temporal dynamics effectively. Integrating text-to-video (T2V) models into video super-resolution for improved temporal modeling is straightforward. However, two key challenges remain: artifacts introduced by complex degradations in real-world scenarios, and compromised fidelity due to the strong generative capacity of powerful T2V models (e.g., CogVideoX-5B). To enhance the spatio-temporal quality of restored videos, we introduce~\name (Spatial-Temporal Augmentation with T2V models for Real-world video super-resolution), a novel approach that leverages T2V models for real-world video super-resolution, achieving realistic spatial details and robust temporal consistency. Specifically, we introduce a Local Information Enhancement Module (LIEM) before the global attention block to enrich local details and mitigate degradation artifacts. Moreover, we propose a Dynamic Frequency (DF) Loss to reinforce fidelity, guiding the model to focus on different frequency components across diffusion steps. Extensive experiments demonstrate~\name~outperforms state-of-the-art methods on both synthetic and real-world datasets.

Summary

AI-Generated Summary

PDF513January 7, 2025