SeedVR:在扩散变压器中播种无限,实现通用视频修复

SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

January 2, 2025
作者: Jianyi Wang, Zhijie Lin, Meng Wei, Yang Zhao, Ceyuan Yang, Chen Change Loy, Lu Jiang
cs.AI

摘要

视频恢复在从未知的退化中恢复时间一致的细节时,面临着保持保真度的非平凡挑战。尽管最近扩散式恢复取得了一些进展,但这些方法通常在生成能力和采样效率方面存在局限。在这项工作中,我们提出了SeedVR,这是一个旨在处理任意长度和分辨率的真实世界视频恢复的扩散变换器。SeedVR的核心设计在于移位窗口注意力,有助于对长视频序列进行有效的恢复。SeedVR进一步支持在空间和时间维度的边界附近使用可变大小的窗口,克服了传统窗口注意力的分辨率限制。SeedVR配备了当代实践,包括因果视频自动编码器、混合图像和视频训练以及渐进式训练,使其在合成和真实世界基准测试以及人工智能生成的视频上实现了高度竞争力的性能。大量实验证明了SeedVR在通用视频恢复方面优于现有方法。
English
Video restoration poses non-trivial challenges in maintaining fidelity while recovering temporally consistent details from unknown degradations in the wild. Despite recent advances in diffusion-based restoration, these methods often face limitations in generation capability and sampling efficiency. In this work, we present SeedVR, a diffusion transformer designed to handle real-world video restoration with arbitrary length and resolution. The core design of SeedVR lies in the shifted window attention that facilitates effective restoration on long video sequences. SeedVR further supports variable-sized windows near the boundary of both spatial and temporal dimensions, overcoming the resolution constraints of traditional window attention. Equipped with contemporary practices, including causal video autoencoder, mixed image and video training, and progressive training, SeedVR achieves highly-competitive performance on both synthetic and real-world benchmarks, as well as AI-generated videos. Extensive experiments demonstrate SeedVR's superiority over existing methods for generic video restoration.

Summary

AI-Generated Summary

PDF112January 3, 2025