StableV2V:穩定視訊到視訊編輯中的形狀一致性

StableV2V: Stablizing Shape Consistency in Video-to-Video Editing

November 17, 2024
作者: Chang Liu, Rui Li, Kaidong Zhang, Yunwei Lan, Dong Liu
cs.AI

摘要

最近生成式人工智慧的進步顯著推動了內容創作和編輯,目前的研究更將這一令人振奮的進展進一步擴展到視頻編輯領域。在這些研究中,主要是將源視頻中固有的運動模式轉移到編輯後的視頻中,然而,由於交付的運動與編輯內容之間缺乏特定的對齊,因此通常會觀察到與用戶提示不一致的結果。為了解決這一限制,本文提出了一種形狀一致的視頻編輯方法,即StableV2V。我們的方法將整個編輯流程分解為幾個連續的步驟,首先編輯第一幀視頻,然後建立交付的運動與用戶提示之間的對齊,最後基於這種對齊將編輯後的內容傳播到所有其他幀。此外,我們還為全面評估視頻編輯,考慮各種類型的提示和困難程度,精心策劃了一個測試基準,即DAVIS-Edit。實驗結果和分析說明了我們的方法相對於現有最先進研究的卓越性能、視覺一致性和推理效率。
English
Recent advancements of generative AI have significantly promoted content creation and editing, where prevailing studies further extend this exciting progress to video editing. In doing so, these studies mainly transfer the inherent motion patterns from the source videos to the edited ones, where results with inferior consistency to user prompts are often observed, due to the lack of particular alignments between the delivered motions and edited contents. To address this limitation, we present a shape-consistent video editing method, namely StableV2V, in this paper. Our method decomposes the entire editing pipeline into several sequential procedures, where it edits the first video frame, then establishes an alignment between the delivered motions and user prompts, and eventually propagates the edited contents to all other frames based on such alignment. Furthermore, we curate a testing benchmark, namely DAVIS-Edit, for a comprehensive evaluation of video editing, considering various types of prompts and difficulties. Experimental results and analyses illustrate the outperforming performance, visual consistency, and inference efficiency of our method compared to existing state-of-the-art studies.

Summary

AI-Generated Summary

PDF115November 19, 2024