ChatPaper.aiChatPaper

穩定一致性調整:理解與改善一致性模型

Stable Consistency Tuning: Understanding and Improving Consistency Models

October 24, 2024
作者: Fu-Yun Wang, Zhengyang Geng, Hongsheng Li
cs.AI

摘要

擴散模型在生成質量上取得卓越表現,但由於去噪過程的迭代性質,導致生成速度較慢。相比之下,一種新的生成模型家族,一致性模型,實現了具有顯著更快採樣速度的競爭性表現。這些模型通過一致性蒸餾或直接從原始數據進行的一致性訓練/調整進行訓練。在這項工作中,我們提出了一個新穎的框架,通過將擴散模型的去噪過程建模為馬爾可夫決策過程(MDP),並將一致性模型訓練定義為通過時間差異(TD)學習進行值估計。更重要的是,這個框架使我們能夠分析當前一致性訓練/調整策略的局限性。基於Easy Consistency Tuning(ECT),我們提出了Stable Consistency Tuning(SCT),它採用使用得分標識的變異減少學習。在CIFAR-10和ImageNet-64等基準測試中,SCT實現了顯著的性能改進。在ImageNet-64上,SCT實現了1步FID 2.42和2步FID 1.55,成為一致性模型的新的最佳表現。
English
Diffusion models achieve superior generation quality but suffer from slow generation speed due to the iterative nature of denoising. In contrast, consistency models, a new generative family, achieve competitive performance with significantly faster sampling. These models are trained either through consistency distillation, which leverages pretrained diffusion models, or consistency training/tuning directly from raw data. In this work, we propose a novel framework for understanding consistency models by modeling the denoising process of the diffusion model as a Markov Decision Process (MDP) and framing consistency model training as the value estimation through Temporal Difference~(TD) Learning. More importantly, this framework allows us to analyze the limitations of current consistency training/tuning strategies. Built upon Easy Consistency Tuning (ECT), we propose Stable Consistency Tuning (SCT), which incorporates variance-reduced learning using the score identity. SCT leads to significant performance improvements on benchmarks such as CIFAR-10 and ImageNet-64. On ImageNet-64, SCT achieves 1-step FID 2.42 and 2-step FID 1.55, a new SoTA for consistency models.

Summary

AI-Generated Summary

PDF103November 16, 2024