FreeScale:透過無需調整的尺度融合,釋放擴散模型的解析能力
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
December 12, 2024
作者: Haonan Qiu, Shiwei Zhang, Yujie Wei, Ruihang Chu, Hangjie Yuan, Xiang Wang, Yingya Zhang, Ziwei Liu
cs.AI
摘要
視覺擴散模型取得了顯著進展,然而通常由於缺乏高分辨率數據和受限的計算資源,它們往往在有限的分辨率下進行訓練,限制了它們生成高保真度圖像或視頻的能力。最近的努力探索了無調整策略,展示了預先訓練模型在更高分辨率視覺生成方面的潛力。然而,這些方法仍然容易產生具有重複模式的低質量視覺內容。關鍵障礙在於當模型生成超出其訓練分辨率的視覺內容時,高頻信息的不可避免增加,導致源自累積誤差的不良重複模式。為了應對這一挑戰,我們提出了 FreeScale,一種無調整的推理範式,通過尺度融合實現更高分辨率的視覺生成。具體而言,FreeScale 從不同感受視度尺度處理信息,然後通過提取所需的頻率成分進行融合。大量實驗驗證了我們範式在擴展圖像和視頻模型的更高分辨率視覺生成能力方面的優越性。值得注意的是,與先前表現最佳的方法相比,FreeScale 首次實現了生成 8k 分辨率圖像。
English
Visual diffusion models achieve remarkable progress, yet they are typically
trained at limited resolutions due to the lack of high-resolution data and
constrained computation resources, hampering their ability to generate
high-fidelity images or videos at higher resolutions. Recent efforts have
explored tuning-free strategies to exhibit the untapped potential
higher-resolution visual generation of pre-trained models. However, these
methods are still prone to producing low-quality visual content with repetitive
patterns. The key obstacle lies in the inevitable increase in high-frequency
information when the model generates visual content exceeding its training
resolution, leading to undesirable repetitive patterns deriving from the
accumulated errors. To tackle this challenge, we propose FreeScale, a
tuning-free inference paradigm to enable higher-resolution visual generation
via scale fusion. Specifically, FreeScale processes information from different
receptive scales and then fuses it by extracting desired frequency components.
Extensive experiments validate the superiority of our paradigm in extending the
capabilities of higher-resolution visual generation for both image and video
models. Notably, compared with the previous best-performing method, FreeScale
unlocks the generation of 8k-resolution images for the first time.Summary
AI-Generated Summary