MoViE：行動式視頻編輯的擴散

摘要

最近在基於擴散的影片編輯方面取得了顯著的進展，展現了實際應用的巨大潛力。然而，這些方法仍然價格昂貴且難以在移動設備上部署。在本研究中，我們介紹了一系列優化方案，使移動影片編輯成為可能。我們在現有的圖像編輯模型基礎上進行優化，並加入了輕量級自編碼器。隨後，我們將無分類器指導蒸餾擴展到多個模態，實現了三倍的設備內加速。最後，我們通過引入一種新穎的對抗式蒸餾方案，將採樣步驟數量減少到一個，從而保留了編輯過程的可控性。總的來說，這些優化方案使得在移動設備上以每秒12幀的速度進行影片編輯成為可能，同時保持高質量。我們的研究結果可在https://qualcomm-ai-research.github.io/mobile-video-editing/ 上查閱。

English

Recent progress in diffusion-based video editing has shown remarkable potential for practical applications. However, these methods remain prohibitively expensive and challenging to deploy on mobile devices. In this study, we introduce a series of optimizations that render mobile video editing feasible. Building upon the existing image editing model, we first optimize its architecture and incorporate a lightweight autoencoder. Subsequently, we extend classifier-free guidance distillation to multiple modalities, resulting in a threefold on-device speedup. Finally, we reduce the number of sampling steps to one by introducing a novel adversarial distillation scheme which preserves the controllability of the editing process. Collectively, these optimizations enable video editing at 12 frames per second on mobile devices, while maintaining high quality. Our results are available at https://qualcomm-ai-research.github.io/mobile-video-editing/

MoViE：行動式視頻編輯的擴散

MoViE: Mobile Diffusion for Video Editing

摘要

Summary

Support