MagicDriveDiT:具備適應控制的自主駕駛高解析度長視頻生成
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
November 21, 2024
作者: Ruiyuan Gao, Kai Chen, Bo Xiao, Lanqing Hong, Zhenguo Li, Qiang Xu
cs.AI
摘要
擴散模型的快速進展極大地改善了視頻合成,尤其是在可控視頻生成方面,這對於自動駕駛等應用至關重要。然而,現有方法受到可擴展性和控制條件整合方式的限制,無法滿足自動駕駛應用對高分辨率和長視頻的需求。本文介紹了一種基於 DiT 結構的新方法 MagicDriveDiT,並應對這些挑戰。我們的方法通過流匹配增強了可擴展性,並採用漸進式訓練策略來應對複雜情境。通過結合時空條件編碼,MagicDriveDiT 實現了對時空潛在特徵的精確控制。全面的實驗表明,它在生成逼真的街景視頻方面表現優異,具有更高的分辨率和更多幀數。MagicDriveDiT 顯著提高了視頻生成質量和時空控制,擴展了其在自動駕駛各項任務中的潛在應用。
English
The rapid advancement of diffusion models has greatly improved video
synthesis, especially in controllable video generation, which is essential for
applications like autonomous driving. However, existing methods are limited by
scalability and how control conditions are integrated, failing to meet the
needs for high-resolution and long videos for autonomous driving applications.
In this paper, we introduce MagicDriveDiT, a novel approach based on the DiT
architecture, and tackle these challenges. Our method enhances scalability
through flow matching and employs a progressive training strategy to manage
complex scenarios. By incorporating spatial-temporal conditional encoding,
MagicDriveDiT achieves precise control over spatial-temporal latents.
Comprehensive experiments show its superior performance in generating realistic
street scene videos with higher resolution and more frames. MagicDriveDiT
significantly improves video generation quality and spatial-temporal controls,
expanding its potential applications across various tasks in autonomous
driving.Summary
AI-Generated Summary