SeedVR：在擴散Transformer中播種無限性，朝向通用視頻修復

摘要

在復原影片方面，要在處理來自未知環境中的時間一致細節時保持忠實度，確實存在著非常具挑戰性的問題。儘管擴散式復原近年來有所進展，但這些方法通常在生成能力和取樣效率方面存在限制。在這項研究中，我們提出了SeedVR，一種專為處理真實世界影片復原而設計的擴散變壓器，能處理任意長度和解析度的影片。SeedVR的核心設計在於移位窗口關注，有助於對長影片序列進行有效復原。SeedVR進一步支持在空間和時間維度的邊界附近使用變大小的窗口，克服了傳統窗口關注的解析度限制。憑藉當代實踐，包括因果影片自編碼器、混合圖像和影片訓練，以及漸進式訓練，SeedVR在合成和真實世界基準測試以及人工智慧生成的影片上均取得了高度競爭力的表現。大量實驗證明了SeedVR相對於現有方法在通用影片復原方面的優越性。

English

Video restoration poses non-trivial challenges in maintaining fidelity while recovering temporally consistent details from unknown degradations in the wild. Despite recent advances in diffusion-based restoration, these methods often face limitations in generation capability and sampling efficiency. In this work, we present SeedVR, a diffusion transformer designed to handle real-world video restoration with arbitrary length and resolution. The core design of SeedVR lies in the shifted window attention that facilitates effective restoration on long video sequences. SeedVR further supports variable-sized windows near the boundary of both spatial and temporal dimensions, overcoming the resolution constraints of traditional window attention. Equipped with contemporary practices, including causal video autoencoder, mixed image and video training, and progressive training, SeedVR achieves highly-competitive performance on both synthetic and real-world benchmarks, as well as AI-generated videos. Extensive experiments demonstrate SeedVR's superiority over existing methods for generic video restoration.

SeedVR：在擴散Transformer中播種無限性，朝向通用視頻修復

SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

摘要

Support