StableV2V：在视频到视频编辑中稳定形状一致性

摘要

最近生成式人工智能的进展显著推动了内容创作和编辑，流行研究进一步将这一激动人心的进展扩展到视频编辑领域。在这方面，这些研究主要是将源视频中固有的运动模式转移到编辑后的视频中，然而由于交付的运动与编辑内容之间缺乏特定的对齐，通常会观察到与用户提示不一致的结果。为了解决这一局限，本文提出了一种形状一致的视频编辑方法，即StableV2V。我们的方法将整个编辑流程分解为几个顺序步骤，首先编辑第一帧视频，然后建立交付的运动与用户提示之间的对齐，最终根据这种对齐将编辑后的内容传播到所有其他帧。此外，我们策划了一个名为DAVIS-Edit的测试基准，用于全面评估视频编辑，考虑各种类型的提示和困难。实验结果和分析展示了我们的方法相对于现有最先进研究的表现优越性能、视觉一致性和推理效率。

English

Recent advancements of generative AI have significantly promoted content creation and editing, where prevailing studies further extend this exciting progress to video editing. In doing so, these studies mainly transfer the inherent motion patterns from the source videos to the edited ones, where results with inferior consistency to user prompts are often observed, due to the lack of particular alignments between the delivered motions and edited contents. To address this limitation, we present a shape-consistent video editing method, namely StableV2V, in this paper. Our method decomposes the entire editing pipeline into several sequential procedures, where it edits the first video frame, then establishes an alignment between the delivered motions and user prompts, and eventually propagates the edited contents to all other frames based on such alignment. Furthermore, we curate a testing benchmark, namely DAVIS-Edit, for a comprehensive evaluation of video editing, considering various types of prompts and difficulties. Experimental results and analyses illustrate the outperforming performance, visual consistency, and inference efficiency of our method compared to existing state-of-the-art studies.

StableV2V：在视频到视频编辑中稳定形状一致性

StableV2V: Stablizing Shape Consistency in Video-to-Video Editing

摘要

Support