MIVE: マルチインスタンスビデオ編集のための新しい設計とベンチマーク

要旨

最近のAIに基づくビデオ編集は、単純なテキストプロンプトを通じてユーザーがビデオを編集できるようにし、編集プロセスを大幅に簡素化しています。ただし、最近のゼロショットビデオ編集技術は主にグローバルまたは単一オブジェクトの編集に焦点を当てており、これはビデオの他の部分に意図しない変更をもたらす可能性があります。複数のオブジェクトに局所的な編集が必要な場合、既存の方法は、忠実でない編集、編集漏れ、適切な評価データセットやメトリクスの不足などの課題に直面しています。これらの制限を克服するために、私たちはゼロショットMulti-Instance Video Editing（MIVE）フレームワークを提案します。MIVEは、特定のオブジェクト（例：人物）に特化していない汎用のマスクベースのフレームワークです。MIVEは、編集漏れを防ぐためのDisentangled Multi-instance Sampling（DMS）と、正確な局所化と忠実な編集を確保するためのInstance-centric Probability Redistribution（IPR）という2つの重要なモジュールを導入しています。さらに、多様なビデオシナリオを特徴とする新しいMIVEデータセットを紹介し、マルチインスタンスビデオ編集タスクにおける編集漏れを評価するためのCross-Instance Accuracy（CIA）スコアを導入しています。私たちの包括的な定性的、定量的、およびユーザースタディの評価は、MIVEが編集の忠実さ、精度、および漏れの防止の観点で最近の最先端の方法を大幅に上回ることを示し、マルチインスタンスビデオ編集の新たな基準を設定しています。プロジェクトページはhttps://kaist-viclab.github.io/mive-site/でご覧いただけます。

English

Recent AI-based video editing has enabled users to edit videos through simple text prompts, significantly simplifying the editing process. However, recent zero-shot video editing techniques primarily focus on global or single-object edits, which can lead to unintended changes in other parts of the video. When multiple objects require localized edits, existing methods face challenges, such as unfaithful editing, editing leakage, and lack of suitable evaluation datasets and metrics. To overcome these limitations, we propose a zero-shot Multi-Instance Video Editing framework, called MIVE. MIVE is a general-purpose mask-based framework, not dedicated to specific objects (e.g., people). MIVE introduces two key modules: (i) Disentangled Multi-instance Sampling (DMS) to prevent editing leakage and (ii) Instance-centric Probability Redistribution (IPR) to ensure precise localization and faithful editing. Additionally, we present our new MIVE Dataset featuring diverse video scenarios and introduce the Cross-Instance Accuracy (CIA) Score to evaluate editing leakage in multi-instance video editing tasks. Our extensive qualitative, quantitative, and user study evaluations demonstrate that MIVE significantly outperforms recent state-of-the-art methods in terms of editing faithfulness, accuracy, and leakage prevention, setting a new benchmark for multi-instance video editing. The project page is available at https://kaist-viclab.github.io/mive-site/

MIVE: マルチインスタンスビデオ編集のための新しい設計とベンチマーク

MIVE: New Design and Benchmark for Multi-Instance Video Editing

要旨

Summary

Support

Support