DiffusionDrive:端截式擴散模型用於端對端自主駕駛

DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving

November 22, 2024
作者: Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, Xinggang Wang
cs.AI

摘要

最近,擴散模型已成為機器人政策學習的強大生成技術,能夠建模多模式動作分佈。利用其端到端自主駕駛的能力是一個有前途的方向。然而,在機器人擴散政策中存在眾多去噪步驟,以及交通場景更具動態、開放性的特性,這對於實時速度生成多樣化駕駛動作構成了重大挑戰。為了應對這些挑戰,我們提出了一種新穎的截斷擴散政策,該政策結合了先前的多模錨點,並截斷了擴散時間表,使模型能夠從錨定的高斯分佈學習去噪至多模式駕駛動作分佈。此外,我們設計了一個高效的級聯擴散解碼器,以增強與條件場景上下文的交互作用。所提出的模型DiffusionDrive相較於基本擴散政策,去噪步驟減少了10倍,僅需2步即可提供優越的多樣性和質量。在以規劃為導向的NAVSIM數據集上,搭配對齊的ResNet-34骨幹,DiffusionDrive實現了88.1的PDMS,無需花巧,創下了新紀錄,並在NVIDIA 4090上以每秒45幀的實時速度運行。對於具有挑戰性情景的定性結果進一步確認,DiffusionDrive能夠穩健地生成多樣且合理的駕駛動作。代碼和模型將在https://github.com/hustvl/DiffusionDrive 上提供。
English
Recently, the diffusion model has emerged as a powerful generative technique for robotic policy learning, capable of modeling multi-mode action distributions. Leveraging its capability for end-to-end autonomous driving is a promising direction. However, the numerous denoising steps in the robotic diffusion policy and the more dynamic, open-world nature of traffic scenes pose substantial challenges for generating diverse driving actions at a real-time speed. To address these challenges, we propose a novel truncated diffusion policy that incorporates prior multi-mode anchors and truncates the diffusion schedule, enabling the model to learn denoising from anchored Gaussian distribution to the multi-mode driving action distribution. Additionally, we design an efficient cascade diffusion decoder for enhanced interaction with conditional scene context. The proposed model, DiffusionDrive, demonstrates 10times reduction in denoising steps compared to vanilla diffusion policy, delivering superior diversity and quality in just 2 steps. On the planning-oriented NAVSIM dataset, with the aligned ResNet-34 backbone, DiffusionDrive achieves 88.1 PDMS without bells and whistles, setting a new record, while running at a real-time speed of 45 FPS on an NVIDIA 4090. Qualitative results on challenging scenarios further confirm that DiffusionDrive can robustly generate diverse plausible driving actions. Code and model will be available at https://github.com/hustvl/DiffusionDrive.

Summary

AI-Generated Summary

PDF152November 28, 2024