잠재 일관성 모델을 위한 향상된 훈련 기법

초록

일관성 모델은 단일 단계 또는 여러 단계에서 고품질 샘플을 생성할 수 있는 새로운 생성 모델 패밀리입니다. 최근에는 일관성 모델이 픽셀 공간에서 확산 모델과 유사한 결과를 달성하며 인상적인 성능을 보여주었습니다. 그러나 텍스트에서 이미지 및 비디오 생성 작업에 대한 대규모 데이터셋에 대한 일관성 훈련의 성공은 잠재 공간에서의 성능에 따라 결정됩니다. 본 연구에서는 픽셀 공간과 잠재 공간 간의 통계적 차이를 분석하여, 잠재 데이터가 종종 매우 충동적인 이상값을 포함하고 있어 잠재 공간에서의 성능을 심각하게 저하시키는 것을 발견했습니다. 이를 해결하기 위해, 우리는 Pseudo-Huber 손실을 Cauchy 손실로 대체하여 이상값의 영향을 효과적으로 완화했습니다. 또한 초기 타임스텝에서 확산 손실을 도입하고 최적 운송(OT) 결합을 사용하여 성능을 더 향상시켰습니다. 마지막으로, 강건한 훈련 프로세스를 관리하기 위해 적응형 스케일링-c 스케줄러를 도입하고, 아키텍처에 비스케일링 레이어 정규화를 채택하여 특징의 통계를 더 잘 캡처하고 이상값의 영향을 줄였습니다. 이러한 전략을 통해, 우리는 1단계 또는 2단계에서 고품질 샘플링이 가능한 잠재 일관성 모델을 성공적으로 훈련시켰으며, 잠재 일관성과 확산 모델 간의 성능 차이를 크게 줄였습니다. 구현은 여기에서 공개되었습니다: https://github.com/quandao10/sLCT/

English

Consistency models are a new family of generative models capable of producing high-quality samples in either a single step or multiple steps. Recently, consistency models have demonstrated impressive performance, achieving results on par with diffusion models in the pixel space. However, the success of scaling consistency training to large-scale datasets, particularly for text-to-image and video generation tasks, is determined by performance in the latent space. In this work, we analyze the statistical differences between pixel and latent spaces, discovering that latent data often contains highly impulsive outliers, which significantly degrade the performance of iCT in the latent space. To address this, we replace Pseudo-Huber losses with Cauchy losses, effectively mitigating the impact of outliers. Additionally, we introduce a diffusion loss at early timesteps and employ optimal transport (OT) coupling to further enhance performance. Lastly, we introduce the adaptive scaling-c scheduler to manage the robust training process and adopt Non-scaling LayerNorm in the architecture to better capture the statistics of the features and reduce outlier impact. With these strategies, we successfully train latent consistency models capable of high-quality sampling with one or two steps, significantly narrowing the performance gap between latent consistency and diffusion models. The implementation is released here: https://github.com/quandao10/sLCT/

잠재 일관성 모델을 위한 향상된 훈련 기법

Improved Training Technique for Latent Consistency Models

초록

Support