视频生成中的多主题开放式个性化
Multi-subject Open-set Personalization in Video Generation
January 10, 2025
作者: Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Yuwei Fang, Kwot Sin Lee, Ivan Skorokhodov, Kfir Aberman, Jun-Yan Zhu, Ming-Hsuan Yang, Sergey Tulyakov
cs.AI
摘要
视频个性化方法使我们能够合成具有特定概念的视频,如人物、宠物和地点。然而,现有方法通常专注于有限的领域,需要针对每个主题进行耗时的优化,或仅支持单个主题。我们提出了Video Alchemist - 一种具有内置多主题、开放集个性化能力的视频模型,适用于前景对象和背景,消除了对耗时的测试时间优化的需求。我们的模型建立在一个新的Diffusion Transformer模块上,它通过交叉注意力层融合每个条件参考图像及其相应的主题级文本提示。开发这样一个庞大的模型面临两个主要挑战:数据集和评估。首先,由于参考图像和视频的配对数据集极其难以收集,我们对选定的视频帧进行采样作为参考图像,并合成目标视频的片段。然而,虽然模型可以轻松去噪训练视频,但它们无法推广到新的情境。为了缓解这个问题,我们设计了一个新的自动数据构建流水线,其中包括大量的图像增强。其次,评估开放集视频个性化本身就是一个挑战。为了解决这个问题,我们引入了一个专注于准确主题保真度并支持多样化个性化场景的个性化基准。最后,我们广泛的实验证明,我们的方法在定量和定性评估中明显优于现有的个性化方法。
English
Video personalization methods allow us to synthesize videos with specific
concepts such as people, pets, and places. However, existing methods often
focus on limited domains, require time-consuming optimization per subject, or
support only a single subject. We present Video Alchemist - a video model
with built-in multi-subject, open-set personalization capabilities for both
foreground objects and background, eliminating the need for time-consuming
test-time optimization. Our model is built on a new Diffusion Transformer
module that fuses each conditional reference image and its corresponding
subject-level text prompt with cross-attention layers. Developing such a large
model presents two main challenges: dataset and evaluation. First, as paired
datasets of reference images and videos are extremely hard to collect, we
sample selected video frames as reference images and synthesize a clip of the
target video. However, while models can easily denoise training videos given
reference frames, they fail to generalize to new contexts. To mitigate this
issue, we design a new automatic data construction pipeline with extensive
image augmentations. Second, evaluating open-set video personalization is a
challenge in itself. To address this, we introduce a personalization benchmark
that focuses on accurate subject fidelity and supports diverse personalization
scenarios. Finally, our extensive experiments show that our method
significantly outperforms existing personalization methods in both quantitative
and qualitative evaluations.Summary
AI-Generated Summary