패션-VDM: 가상 착용을 위한 비디오 확산 모델

초록

저희는 가상 시착 비디오를 생성하기 위한 비디오 확산 모델 (VDM)인 Fashion-VDM을 제안합니다. 입력 의류 이미지와 사람 비디오가 주어졌을 때, 저희 방법은 주어진 의류를 입은 사람의 고품질 시착 비디오를 생성하면서 사람의 정체성과 움직임을 보존합니다. 이미지 기반 가상 시착은 인상적인 결과를 보여주었지만, 기존의 비디오 가상 시착 (VVT) 방법은 여전히 의류 세부 사항과 시간적 일관성이 부족합니다. 이러한 문제를 해결하기 위해 저희는 비디오 가상 시착을 위한 확산 기반 아키텍처, 분할된 분류기 없는 가이드로 인해 조건 입력에 대한 제어를 높이고, 단일 패스 64프레임, 512픽셀 비디오 생성을 위한 점진적 시간적 훈련 전략을 제안합니다. 또한 비디오 데이터가 제한적일 때 특히 비디오 시착을 위한 이미지-비디오 합동 훈련의 효과를 입증합니다. 저희의 질적 및 양적 실험 결과는 저희의 접근 방식이 비디오 가상 시착의 새로운 최고 수준을 설정한다는 것을 보여줍니다. 추가 결과는 저희 프로젝트 페이지를 방문해주십시오: https://johannakarras.github.io/Fashion-VDM.

English

We present Fashion-VDM, a video diffusion model (VDM) for generating virtual try-on videos. Given an input garment image and person video, our method aims to generate a high-quality try-on video of the person wearing the given garment, while preserving the person's identity and motion. Image-based virtual try-on has shown impressive results; however, existing video virtual try-on (VVT) methods are still lacking garment details and temporal consistency. To address these issues, we propose a diffusion-based architecture for video virtual try-on, split classifier-free guidance for increased control over the conditioning inputs, and a progressive temporal training strategy for single-pass 64-frame, 512px video generation. We also demonstrate the effectiveness of joint image-video training for video try-on, especially when video data is limited. Our qualitative and quantitative experiments show that our approach sets the new state-of-the-art for video virtual try-on. For additional results, visit our project page: https://johannakarras.github.io/Fashion-VDM.

패션-VDM: 가상 착용을 위한 비디오 확산 모델

Fashion-VDM: Video Diffusion Model for Virtual Try-On

초록

Summary

Support