影片生成中的多主題開放式個性化

Multi-subject Open-set Personalization in Video Generation

January 10, 2025
作者: Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Yuwei Fang, Kwot Sin Lee, Ivan Skorokhodov, Kfir Aberman, Jun-Yan Zhu, Ming-Hsuan Yang, Sergey Tulyakov
cs.AI

摘要

影片個性化方法使我們能夠合成具有特定概念的影片,例如人物、寵物和地點。然而,現有方法通常專注於有限的領域,需要耗時的每個主題的優化,或僅支持單一主題。我們提出了Video Alchemist - 一種具有內置多主題、開放式個性化能力的影片模型,可用於前景物件和背景,消除了對耗時的測試時間優化的需求。我們的模型建立在一個新的Diffusion Transformer模塊上,該模塊將每個條件參考圖像及其對應的主題級文本提示與交叉注意力層融合。開發這樣一個大型模型面臨兩個主要挑戰:數據集和評估。首先,由於參考圖像和影片的配對數據集非常難以收集,我們選擇將選定的影片幀作為參考圖像,並合成目標影片的片段。然而,儘管模型可以輕鬆去噪訓練影片,但它們無法推廣到新的情境。為了緩解這個問題,我們設計了一個新的自動數據構建流程,其中包括廣泛的圖像增強。其次,評估開放式影片個性化本身就是一個挑戰。為了應對這一問題,我們引入了一個著重於準確主題忠實度並支持多樣化個性化場景的個性化基準。最後,我們的廣泛實驗表明,我們的方法在定量和定性評估中顯著優於現有的個性化方法。
English
Video personalization methods allow us to synthesize videos with specific concepts such as people, pets, and places. However, existing methods often focus on limited domains, require time-consuming optimization per subject, or support only a single subject. We present Video Alchemist - a video model with built-in multi-subject, open-set personalization capabilities for both foreground objects and background, eliminating the need for time-consuming test-time optimization. Our model is built on a new Diffusion Transformer module that fuses each conditional reference image and its corresponding subject-level text prompt with cross-attention layers. Developing such a large model presents two main challenges: dataset and evaluation. First, as paired datasets of reference images and videos are extremely hard to collect, we sample selected video frames as reference images and synthesize a clip of the target video. However, while models can easily denoise training videos given reference frames, they fail to generalize to new contexts. To mitigate this issue, we design a new automatic data construction pipeline with extensive image augmentations. Second, evaluating open-set video personalization is a challenge in itself. To address this, we introduce a personalization benchmark that focuses on accurate subject fidelity and supports diverse personalization scenarios. Finally, our extensive experiments show that our method significantly outperforms existing personalization methods in both quantitative and qualitative evaluations.

Summary

AI-Generated Summary

PDF102January 13, 2025