時尚-VDM:虛擬試穿的影片擴散模型
Fashion-VDM: Video Diffusion Model for Virtual Try-On
October 31, 2024
作者: Johanna Karras, Yingwei Li, Nan Liu, Luyang Zhu, Innfarn Yoo, Andreas Lugmayr, Chris Lee, Ira Kemelmacher-Shlizerman
cs.AI
摘要
我們提出了Fashion-VDM,一種用於生成虛擬試穿視頻的視頻擴散模型(VDM)。給定一個服裝圖像和人物視頻,我們的方法旨在生成一個高質量的試穿視頻,展示穿著該服裝的人,同時保留人物的身份和動作。基於圖像的虛擬試穿已經取得了令人印象深刻的成果;然而,現有的視頻虛擬試穿(VVT)方法仍然缺乏服裝細節和時間一致性。為了解決這些問題,我們提出了一種基於擴散的架構,用於視頻虛擬試穿,分離無分類器的引導,以增加對條件輸入的控制,以及一種漸進式的時間訓練策略,用於單遍 64 幀、512 像素視頻生成。我們還展示了聯合圖像-視頻訓練對視頻試穿的有效性,特別是當視頻數據有限時。我們的定性和定量實驗表明,我們的方法為視頻虛擬試穿設定了新的技術水平。欲獲取更多結果,請訪問我們的項目頁面:https://johannakarras.github.io/Fashion-VDM。
English
We present Fashion-VDM, a video diffusion model (VDM) for generating virtual
try-on videos. Given an input garment image and person video, our method aims
to generate a high-quality try-on video of the person wearing the given
garment, while preserving the person's identity and motion. Image-based virtual
try-on has shown impressive results; however, existing video virtual try-on
(VVT) methods are still lacking garment details and temporal consistency. To
address these issues, we propose a diffusion-based architecture for video
virtual try-on, split classifier-free guidance for increased control over the
conditioning inputs, and a progressive temporal training strategy for
single-pass 64-frame, 512px video generation. We also demonstrate the
effectiveness of joint image-video training for video try-on, especially when
video data is limited. Our qualitative and quantitative experiments show that
our approach sets the new state-of-the-art for video virtual try-on. For
additional results, visit our project page:
https://johannakarras.github.io/Fashion-VDM.Summary
AI-Generated Summary