VFX 생성기: 제어 가능한 확산 트랜스포머를 이용한 애니메이션 비주얼 효과 생성

초록

마법과 환상을 만들어내는 것은 영화 제작의 가장 흥미로운 측면 중 하나이며, 시각 효과 (VFX)는 잊을 수 없는 영화 경험을 뒷받침하는 핵심 요소입니다. 최근 발전한 생성적 인공지능 기술은 일반적인 이미지 및 비디오 합성 분야에서 진전을 이끌어내었지만, 조절 가능한 VFX 생성 영역은 비교적 미개척된 상태입니다. 본 연구에서는 사용자 친화적인 텍스트 설명과 정적 참조 이미지로부터 동적 효과를 생성하는 이미지 애니메이션으로서의 VFX 생성을 위한 새로운 패러다임을 제안합니다. 본 연구는 두 가지 주요 기여를 제공합니다: (i) 15가지 다양한 효과 범주를 포괄하는 고품질 VFX 비디오 데이터셋인 Open-VFX, 공간 조건부를 위한 텍스트 설명, 인스턴스 분할 마스크, 시간 제어를 위한 시작-종료 타임스탬프로 주석이 달린 데이터셋입니다. (ii) 비디오 확산 트랜스포머를 기반으로 한 간단하면서 효과적인 조절 가능한 VFX 생성 프레임워크인 VFX Creator입니다. 이 모델은 공간 및 시간 조절 가능한 LoRA 어댑터를 통합하며, 최소한의 훈련 비디오를 필요로 합니다. 특히, 플러그 앤 플레이 마스크 제어 모듈은 인스턴스 수준의 공간 조작을 가능하게 하며, 확산 프로세스에 포함된 토큰화된 시작-종료 모션 타임스탬프는 텍스트 인코더와 함께 효과의 타이밍과 속도에 대한 정확한 시간 제어를 허용합니다. Open-VFX 테스트 세트에서의 광범위한 실험은 제안된 시스템이 현실적이고 동적인 효과를 생성하는 데 우수함을 입증하며, 공간 및 시간 조절 가능성에서 최첨단 성능과 일반화 능력을 달성합니다. 더불어, 시간 제어의 정밀도를 평가하기 위한 전문 메트릭을 소개합니다. 전통적인 VFX 기술과 생성적 접근법을 연결함으로써, VFX Creator는 효율적이고 고품질의 비디오 효과 생성을 위한 새로운 가능성을 열어주어 고급 VFX를 보다 넓은 관객에게 접근 가능하게 합니다.

English

Crafting magic and illusions is one of the most thrilling aspects of filmmaking, with visual effects (VFX) serving as the powerhouse behind unforgettable cinematic experiences. While recent advances in generative artificial intelligence have driven progress in generic image and video synthesis, the domain of controllable VFX generation remains relatively underexplored. In this work, we propose a novel paradigm for animated VFX generation as image animation, where dynamic effects are generated from user-friendly textual descriptions and static reference images. Our work makes two primary contributions: (i) Open-VFX, the first high-quality VFX video dataset spanning 15 diverse effect categories, annotated with textual descriptions, instance segmentation masks for spatial conditioning, and start-end timestamps for temporal control. (ii) VFX Creator, a simple yet effective controllable VFX generation framework based on a Video Diffusion Transformer. The model incorporates a spatial and temporal controllable LoRA adapter, requiring minimal training videos. Specifically, a plug-and-play mask control module enables instance-level spatial manipulation, while tokenized start-end motion timestamps embedded in the diffusion process, alongside the text encoder, allow precise temporal control over effect timing and pace. Extensive experiments on the Open-VFX test set demonstrate the superiority of the proposed system in generating realistic and dynamic effects, achieving state-of-the-art performance and generalization ability in both spatial and temporal controllability. Furthermore, we introduce a specialized metric to evaluate the precision of temporal control. By bridging traditional VFX techniques with generative approaches, VFX Creator unlocks new possibilities for efficient and high-quality video effect generation, making advanced VFX accessible to a broader audience.

VFX 생성기: 제어 가능한 확산 트랜스포머를 이용한 애니메이션 비주얼 효과 생성

VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer

초록

Support