DiffusionDrive:端到端自动驾驶的截断扩散模型
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving
November 22, 2024
作者: Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, Xinggang Wang
cs.AI
摘要
最近,扩散模型作为一种强大的生成技术,用于机器人策略学习,能够建模多模式动作分布。利用其端到端自动驾驶的能力是一个有前途的方向。然而,在机器人扩散策略中存在大量去噪步骤,以及交通场景更加动态、开放的特性,这给实时速度下生成多样化驾驶动作带来了重大挑战。为了解决这些挑战,我们提出了一种新颖的截断扩散策略,该策略结合了先验的多模式锚点,并截断了扩散进程,使模型能够从锚定的高斯分布学习去噪到多模式驾驶动作分布。此外,我们设计了一种高效的级联扩散解码器,以增强与条件场景上下文的交互。所提出的模型DiffusionDrive相比于基本扩散策略,去噪步骤减少了10倍,在仅两步中提供了更优异的多样性和质量。在面向规划的NAVSIM数据集上,使用对齐的ResNet-34骨干网络,DiffusionDrive在没有花哨技巧的情况下达到了88.1 PDMS,创造了新纪录,同时在NVIDIA 4090上以每秒45帧的实时速度运行。在具有挑战性的场景上的定性结果进一步证实了DiffusionDrive能够稳健地生成多样化且合理的驾驶动作。代码和模型将在https://github.com/hustvl/DiffusionDrive 上提供。
English
Recently, the diffusion model has emerged as a powerful generative technique
for robotic policy learning, capable of modeling multi-mode action
distributions. Leveraging its capability for end-to-end autonomous driving is a
promising direction. However, the numerous denoising steps in the robotic
diffusion policy and the more dynamic, open-world nature of traffic scenes pose
substantial challenges for generating diverse driving actions at a real-time
speed. To address these challenges, we propose a novel truncated diffusion
policy that incorporates prior multi-mode anchors and truncates the diffusion
schedule, enabling the model to learn denoising from anchored Gaussian
distribution to the multi-mode driving action distribution. Additionally, we
design an efficient cascade diffusion decoder for enhanced interaction with
conditional scene context. The proposed model, DiffusionDrive, demonstrates
10times reduction in denoising steps compared to vanilla diffusion policy,
delivering superior diversity and quality in just 2 steps. On the
planning-oriented NAVSIM dataset, with the aligned ResNet-34 backbone,
DiffusionDrive achieves 88.1 PDMS without bells and whistles, setting a new
record, while running at a real-time speed of 45 FPS on an NVIDIA 4090.
Qualitative results on challenging scenarios further confirm that
DiffusionDrive can robustly generate diverse plausible driving actions. Code
and model will be available at https://github.com/hustvl/DiffusionDrive.Summary
AI-Generated Summary