FasterCache: 고품질 비디오 확산 모델 가속화를 위한 훈련 없는 방법

초록

본 논문에서는 고품질 생성을 가속화하기 위해 설계된 훈련 불필요한 새로운 전략인 \textit{FasterCache}를 제안합니다. 기존 캐시 기반 방법을 분석한 결과, 인접 단계 특성을 직접 재사용하는 것이 세밀한 변화의 손실로 인해 비디오 품질을 저하시킨다는 것을 관찰했습니다. 또한, 분류기 없는 가이드 (CFG)의 가속 잠재력을 독창적으로 조사하고, 동일한 타임스텝 내 조건부 및 무조건적 특성 간의 중복성을 확인했습니다. 이러한 관찰을 활용하여, 우리는 FasterCache를 소개하여 확산 기반 비디오 생성을 상당히 가속화합니다. 주요 기여 사항으로는 특성 구별과 시간적 연속성을 보존하는 동적 특성 재사용 전략 및 비디오 품질을 저하시키지 않고 추론 속도를 더 향상시키기 위해 조건부 및 무조건적 출력의 재사용을 최적화하는 CFG-Cache가 포함됩니다. 우리는 최근 비디오 확산 모델에서 FasterCache를 실험적으로 평가했습니다. 실험 결과는 FasterCache가 비디오 생성을 상당히 가속화할 수 있음을 보여주며(Vchitect-2.0에서 1.67배 속도 향상), 비디오 품질을 기준선과 비교 가능한 수준으로 유지하면서 추론 속도와 비디오 품질 모두에서 기존 방법을 일관되게 능가한다는 것을 보여줍니다.

English

In this paper, we present \textit{FasterCache}, a novel training-free strategy designed to accelerate the inference of video diffusion models with high-quality generation. By analyzing existing cache-based methods, we observe that directly reusing adjacent-step features degrades video quality due to the loss of subtle variations. We further perform a pioneering investigation of the acceleration potential of classifier-free guidance (CFG) and reveal significant redundancy between conditional and unconditional features within the same timestep. Capitalizing on these observations, we introduce FasterCache to substantially accelerate diffusion-based video generation. Our key contributions include a dynamic feature reuse strategy that preserves both feature distinction and temporal continuity, and CFG-Cache which optimizes the reuse of conditional and unconditional outputs to further enhance inference speed without compromising video quality. We empirically evaluate FasterCache on recent video diffusion models. Experimental results show that FasterCache can significantly accelerate video generation (\eg 1.67times speedup on Vchitect-2.0) while keeping video quality comparable to the baseline, and consistently outperform existing methods in both inference speed and video quality.

FasterCache: 고품질 비디오 확산 모델 가속화를 위한 훈련 없는 방법

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

초록

Support