FreeScale:通过无需调整的尺度融合释放扩散模型的分辨率

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

December 12, 2024
作者: Haonan Qiu, Shiwei Zhang, Yujie Wei, Ruihang Chu, Hangjie Yuan, Xiang Wang, Yingya Zhang, Ziwei Liu
cs.AI

摘要

视觉扩散模型取得了显著进展,但通常由于缺乏高分辨率数据和受限的计算资源,它们在训练时受到分辨率限制,从而影响了它们生成高保真图像或视频的能力。最近的研究探索了无需调整的策略,展示了预训练模型在展示未开发潜力的高分辨率视觉生成方面的能力。然而,这些方法仍然容易产生质量低劣、带有重复模式的视觉内容。关键障碍在于当模型生成超出其训练分辨率的视觉内容时,高频信息的增加是不可避免的,导致累积误差产生的不良重复模式。为了解决这一挑战,我们提出了FreeScale,这是一种无需调整的推理范式,通过尺度融合实现更高分辨率的视觉生成。具体而言,FreeScale从不同感知尺度处理信息,然后通过提取所需的频率分量进行融合。大量实验证实了我们的范式在扩展图像和视频模型的高分辨率视觉生成能力方面的优越性。值得注意的是,与先前表现最佳的方法相比,FreeScale首次实现了生成8k分辨率图像。
English
Visual diffusion models achieve remarkable progress, yet they are typically trained at limited resolutions due to the lack of high-resolution data and constrained computation resources, hampering their ability to generate high-fidelity images or videos at higher resolutions. Recent efforts have explored tuning-free strategies to exhibit the untapped potential higher-resolution visual generation of pre-trained models. However, these methods are still prone to producing low-quality visual content with repetitive patterns. The key obstacle lies in the inevitable increase in high-frequency information when the model generates visual content exceeding its training resolution, leading to undesirable repetitive patterns deriving from the accumulated errors. To tackle this challenge, we propose FreeScale, a tuning-free inference paradigm to enable higher-resolution visual generation via scale fusion. Specifically, FreeScale processes information from different receptive scales and then fuses it by extracting desired frequency components. Extensive experiments validate the superiority of our paradigm in extending the capabilities of higher-resolution visual generation for both image and video models. Notably, compared with the previous best-performing method, FreeScale unlocks the generation of 8k-resolution images for the first time.

Summary

AI-Generated Summary

PDF202December 16, 2024