擴展擴散模型的推論時間縮放,超越縮放去噪步驟。
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
January 16, 2025
作者: Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, Saining Xie
cs.AI
摘要
生成模型在各個領域產生了顯著影響,這主要歸因於它們在訓練過程中通過增加數據、計算資源和模型大小來實現規模化的能力,這種現象被稱為規模定律。最近的研究開始探索大型語言模型(LLMs)在推論時的規模化行為,揭示了在推論過程中進行額外計算如何進一步提高性能。與LLMs不同,擴散模型固有地具有通過去噪步驟數量調整推論時計算的靈活性,盡管性能增益通常在幾十個步驟後趨於平緩。在這項工作中,我們探索了擴散模型在推論時的規模化行為,超越了增加去噪步驟,並研究了如何通過增加計算來進一步提高生成性能。具體而言,我們考慮一個旨在識別擴散抽樣過程中更好噪聲的搜索問題。我們沿著兩個軸結構了設計空間:用於提供反饋的驗證器,以及用於找到更好噪聲候選者的算法。通過在類別條件和文本條件的圖像生成基準上進行大量實驗,我們的研究結果顯示,增加推論時計算會顯著提高擴散模型生成的樣本質量,並且隨著圖像的複雜性,可以選擇組合框架中的組件以符合不同的應用場景。
English
Generative models have made significant impacts across various domains,
largely due to their ability to scale during training by increasing data,
computational resources, and model size, a phenomenon characterized by the
scaling laws. Recent research has begun to explore inference-time scaling
behavior in Large Language Models (LLMs), revealing how performance can further
improve with additional computation during inference. Unlike LLMs, diffusion
models inherently possess the flexibility to adjust inference-time computation
via the number of denoising steps, although the performance gains typically
flatten after a few dozen. In this work, we explore the inference-time scaling
behavior of diffusion models beyond increasing denoising steps and investigate
how the generation performance can further improve with increased computation.
Specifically, we consider a search problem aimed at identifying better noises
for the diffusion sampling process. We structure the design space along two
axes: the verifiers used to provide feedback, and the algorithms used to find
better noise candidates. Through extensive experiments on class-conditioned and
text-conditioned image generation benchmarks, our findings reveal that
increasing inference-time compute leads to substantial improvements in the
quality of samples generated by diffusion models, and with the complicated
nature of images, combinations of the components in the framework can be
specifically chosen to conform with different application scenario.Summary
AI-Generated Summary