RAD:基于大规模3D高斯散射的端到端驾驶策略强化学习训练
RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
February 18, 2025
作者: Hao Gao, Shaoyu Chen, Bo Jiang, Bencheng Liao, Yiang Shi, Xiaoyang Guo, Yuechuan Pu, Haoran Yin, Xiangyu Li, Xinbang Zhang, Ying Zhang, Wenyu Liu, Qian Zhang, Xinggang Wang
cs.AI
摘要
现有的端到端自动驾驶(AD)算法通常遵循模仿学习(IL)范式,但这一范式面临着因果混淆和开环差距等挑战。在本研究中,我们建立了一种基于3D高斯散射(3DGS)的闭环强化学习(RL)训练范式。通过运用3DGS技术,我们构建了一个真实物理世界的高保真数字复制品,使AD策略能够广泛探索状态空间,并通过大规模试错学习处理分布外场景。为了增强安全性,我们设计了专门的奖励机制,引导策略有效应对安全关键事件并理解现实世界的因果关系。为了更好地与人类驾驶行为对齐,我们将IL作为正则化项融入RL训练中。我们引入了一个由多样化、前所未见的3DGS环境组成的闭环评估基准。与基于IL的方法相比,RAD在大多数闭环指标上表现出更强的性能,尤其是碰撞率降低了3倍。丰富的闭环实验结果展示在https://hgao-cv.github.io/RAD。
English
Existing end-to-end autonomous driving (AD) algorithms typically follow the
Imitation Learning (IL) paradigm, which faces challenges such as causal
confusion and the open-loop gap. In this work, we establish a 3DGS-based
closed-loop Reinforcement Learning (RL) training paradigm. By leveraging 3DGS
techniques, we construct a photorealistic digital replica of the real physical
world, enabling the AD policy to extensively explore the state space and learn
to handle out-of-distribution scenarios through large-scale trial and error. To
enhance safety, we design specialized rewards that guide the policy to
effectively respond to safety-critical events and understand real-world causal
relationships. For better alignment with human driving behavior, IL is
incorporated into RL training as a regularization term. We introduce a
closed-loop evaluation benchmark consisting of diverse, previously unseen 3DGS
environments. Compared to IL-based methods, RAD achieves stronger performance
in most closed-loop metrics, especially 3x lower collision rate. Abundant
closed-loop results are presented at https://hgao-cv.github.io/RAD.Summary
AI-Generated Summary