ZipIR:基于潜在金字塔扩散变换器的高分辨率图像复原
ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration
April 11, 2025
作者: Yongsheng Yu, Haitian Zheng, Zhifei Zhang, Jianming Zhang, Yuqian Zhou, Connelly Barnes, Yuchen Liu, Wei Xiong, Zhe Lin, Jiebo Luo
cs.AI
摘要
近期,生成模型在图像修复领域取得了显著进展,尤其是通过强大的扩散模型实现了语义细节和局部保真度的卓越恢复。然而,在超高分辨率下部署这些模型时,由于长程注意力机制的计算需求,面临着质量与效率之间的关键权衡。为解决这一问题,我们提出了ZipIR,这一新颖框架旨在提升高分辨率图像修复的效率、可扩展性及长程建模能力。ZipIR采用了一种高度压缩的潜在表示,将图像压缩32倍,有效减少了空间标记的数量,并使得如扩散变换器(DiT)等高容量模型的应用成为可能。为此,我们设计了一种潜在金字塔变分自编码器(LP-VAE),通过将潜在空间结构化至子带,简化了扩散训练过程。ZipIR在高达2K分辨率的完整图像上进行训练,超越了现有的基于扩散的方法,在从严重退化的输入中恢复高分辨率图像时,提供了无与伦比的速度与质量。
English
Recent progress in generative models has significantly improved image
restoration capabilities, particularly through powerful diffusion models that
offer remarkable recovery of semantic details and local fidelity. However,
deploying these models at ultra-high resolutions faces a critical trade-off
between quality and efficiency due to the computational demands of long-range
attention mechanisms. To address this, we introduce ZipIR, a novel framework
that enhances efficiency, scalability, and long-range modeling for high-res
image restoration. ZipIR employs a highly compressed latent representation that
compresses image 32x, effectively reducing the number of spatial tokens, and
enabling the use of high-capacity models like the Diffusion Transformer (DiT).
Toward this goal, we propose a Latent Pyramid VAE (LP-VAE) design that
structures the latent space into sub-bands to ease diffusion training. Trained
on full images up to 2K resolution, ZipIR surpasses existing diffusion-based
methods, offering unmatched speed and quality in restoring high-resolution
images from severely degraded inputs.Summary
AI-Generated Summary