簡化、穩定化和擴展連續時間一致性模型。
Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models
October 14, 2024
作者: Cheng Lu, Yang Song
cs.AI
摘要
一致性模型(CMs)是一類基於擴散的生成模型,優化了快速取樣的效能。大多數現有的CMs是使用離散化時間步長進行訓練的,這導致引入了額外的超參數並容易出現離散化錯誤。儘管連續時間的公式可以緩解這些問題,但由於訓練不穩定,其成功機會有限。為了解決這個問題,我們提出了一個簡化的理論框架,統一了以前對擴散模型和CMs的參數化,識別了不穩定性的根本原因。基於這一分析,我們在擴散過程參數化、網絡架構和訓練目標方面引入了關鍵改進。這些變化使我們能夠以前所未有的規模訓練連續時間CMs,在ImageNet 512x512上達到了15億參數。我們提出的訓練算法僅使用兩個取樣步驟,在CIFAR-10上實現了2.06的FID分數,在ImageNet 64x64上為1.48,在ImageNet 512x512上為1.88,將FID分數與最佳現有擴散模型的差距縮小到不到10%。
English
Consistency models (CMs) are a powerful class of diffusion-based generative
models optimized for fast sampling. Most existing CMs are trained using
discretized timesteps, which introduce additional hyperparameters and are prone
to discretization errors. While continuous-time formulations can mitigate these
issues, their success has been limited by training instability. To address
this, we propose a simplified theoretical framework that unifies previous
parameterizations of diffusion models and CMs, identifying the root causes of
instability. Based on this analysis, we introduce key improvements in diffusion
process parameterization, network architecture, and training objectives. These
changes enable us to train continuous-time CMs at an unprecedented scale,
reaching 1.5B parameters on ImageNet 512x512. Our proposed training algorithm,
using only two sampling steps, achieves FID scores of 2.06 on CIFAR-10, 1.48 on
ImageNet 64x64, and 1.88 on ImageNet 512x512, narrowing the gap in FID scores
with the best existing diffusion models to within 10%.Summary
AI-Generated Summary