移動視頻傳播

Mobile Video Diffusion

December 10, 2024
作者: Haitam Ben Yahia, Denis Korzhenkov, Ioannis Lelekas, Amir Ghodrati, Amirhossein Habibian
cs.AI

摘要

影片擴散模型已經取得了令人印象深刻的逼真性和可控性,但受到高計算需求的限制,限制了它們在移動設備上的應用。本文介紹了第一個針對移動設備優化的影片擴散模型。從穩定影片擴散(SVD)的時空UNet開始,我們通過降低幀分辨率、融入多尺度時序表示,並引入兩種新的剪枝模式來降低通道數和時序塊的數量,從而降低內存和計算成本。此外,我們採用對抗微調來將去噪減少到一個步驟。我們的模型,被稱為MobileVD,在效率上提高了523倍(1817.2 vs. 4.34 TFLOPs),並且僅有輕微的質量下降(FVD 149 vs. 171),在小米14 Pro上在1.7秒內為一個14x512x256像素的片段生成潛像。我們的結果可在https://qualcomm-ai-research.github.io/mobile-video-diffusion/上查看。
English
Video diffusion models have achieved impressive realism and controllability but are limited by high computational demands, restricting their use on mobile devices. This paper introduces the first mobile-optimized video diffusion model. Starting from a spatio-temporal UNet from Stable Video Diffusion (SVD), we reduce memory and computational cost by reducing the frame resolution, incorporating multi-scale temporal representations, and introducing two novel pruning schema to reduce the number of channels and temporal blocks. Furthermore, we employ adversarial finetuning to reduce the denoising to a single step. Our model, coined as MobileVD, is 523x more efficient (1817.2 vs. 4.34 TFLOPs) with a slight quality drop (FVD 149 vs. 171), generating latents for a 14x512x256 px clip in 1.7 seconds on a Xiaomi-14 Pro. Our results are available at https://qualcomm-ai-research.github.io/mobile-video-diffusion/

Summary

AI-Generated Summary

PDF202December 11, 2024