ZipIR：基于潜在金字塔扩散变换器的高分辨率图像复原

摘要

近期，生成模型在图像修复领域取得了显著进展，尤其是通过强大的扩散模型实现了语义细节和局部保真度的卓越恢复。然而，在超高分辨率下部署这些模型时，由于长程注意力机制的计算需求，面临着质量与效率之间的关键权衡。为解决这一问题，我们提出了ZipIR，这一新颖框架旨在提升高分辨率图像修复的效率、可扩展性及长程建模能力。ZipIR采用了一种高度压缩的潜在表示，将图像压缩32倍，有效减少了空间标记的数量，并使得如扩散变换器（DiT）等高容量模型的应用成为可能。为此，我们设计了一种潜在金字塔变分自编码器（LP-VAE），通过将潜在空间结构化至子带，简化了扩散训练过程。ZipIR在高达2K分辨率的完整图像上进行训练，超越了现有的基于扩散的方法，在从严重退化的输入中恢复高分辨率图像时，提供了无与伦比的速度与质量。

English

Recent progress in generative models has significantly improved image restoration capabilities, particularly through powerful diffusion models that offer remarkable recovery of semantic details and local fidelity. However, deploying these models at ultra-high resolutions faces a critical trade-off between quality and efficiency due to the computational demands of long-range attention mechanisms. To address this, we introduce ZipIR, a novel framework that enhances efficiency, scalability, and long-range modeling for high-res image restoration. ZipIR employs a highly compressed latent representation that compresses image 32x, effectively reducing the number of spatial tokens, and enabling the use of high-capacity models like the Diffusion Transformer (DiT). Toward this goal, we propose a Latent Pyramid VAE (LP-VAE) design that structures the latent space into sub-bands to ease diffusion training. Trained on full images up to 2K resolution, ZipIR surpasses existing diffusion-based methods, offering unmatched speed and quality in restoring high-resolution images from severely degraded inputs.

ZipIR：基于潜在金字塔扩散变换器的高分辨率图像复原

ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration

摘要

Summary

Support

Support