潜在一致性模型的改进训练技术
Improved Training Technique for Latent Consistency Models
February 3, 2025
作者: Quan Dao, Khanh Doan, Di Liu, Trung Le, Dimitris Metaxas
cs.AI
摘要
一致性模型是一类新的生成模型,能够在单步或多步中生成高质量样本。最近,一致性模型展现出令人印象深刻的性能,与像素空间中的扩散模型取得了相媲美的结果。然而,将一致性训练扩展到大规模数据集的成功,特别是针对文本到图像和视频生成任务,取决于潜在空间中的性能。在这项工作中,我们分析了像素空间和潜在空间之间的统计差异,发现潜在数据通常包含高度冲动的异常值,显著降低了潜在空间中一致性训练的性能。为了解决这个问题,我们用柯西损失替换了伪胡伯损失,有效地减轻了异常值的影响。此外,我们在早期时间步引入了扩散损失,并采用最优输运(OT)耦合来进一步增强性能。最后,我们引入了自适应缩放调度器来管理稳健的训练过程,并在架构中采用非缩放层归一化来更好地捕捉特征的统计信息并减少异常值的影响。通过这些策略,我们成功训练了能够在一到两步内生成高质量样本的潜在一致性模型,显著缩小了潜在一致性模型与扩散模型之间的性能差距。实现代码发布在这里:https://github.com/quandao10/sLCT/
English
Consistency models are a new family of generative models capable of producing
high-quality samples in either a single step or multiple steps. Recently,
consistency models have demonstrated impressive performance, achieving results
on par with diffusion models in the pixel space. However, the success of
scaling consistency training to large-scale datasets, particularly for
text-to-image and video generation tasks, is determined by performance in the
latent space. In this work, we analyze the statistical differences between
pixel and latent spaces, discovering that latent data often contains highly
impulsive outliers, which significantly degrade the performance of iCT in the
latent space. To address this, we replace Pseudo-Huber losses with Cauchy
losses, effectively mitigating the impact of outliers. Additionally, we
introduce a diffusion loss at early timesteps and employ optimal transport (OT)
coupling to further enhance performance. Lastly, we introduce the adaptive
scaling-c scheduler to manage the robust training process and adopt
Non-scaling LayerNorm in the architecture to better capture the statistics of
the features and reduce outlier impact. With these strategies, we successfully
train latent consistency models capable of high-quality sampling with one or
two steps, significantly narrowing the performance gap between latent
consistency and diffusion models. The implementation is released here:
https://github.com/quandao10/sLCT/Summary
AI-Generated Summary