RAD: 대규모 3DGS 기반 강화 학습을 통한 종단간 주행 정책 훈련

초록

기존의 종단간 자율주행(AD) 알고리즘은 일반적으로 모방 학습(IL) 패러다임을 따르며, 이는 인과적 혼동(causal confusion)과 개방 루프 간극(open-loop gap)과 같은 문제에 직면해 있습니다. 본 연구에서는 3DGS(3D Gaussian Splatting) 기반의 폐쇄 루프 강화 학습(RL) 훈련 패러다임을 구축합니다. 3DGS 기술을 활용하여 실제 물리 세계를 사실적으로 재현한 디지털 복제본을 구축함으로써, AD 정책이 상태 공간을 광범위하게 탐색하고 대규모 시행착오를 통해 분포 외(out-of-distribution) 시나리오를 처리하는 방법을 학습할 수 있도록 합니다. 안전성을 강화하기 위해, 정책이 안전 관련 위기 상황에 효과적으로 대응하고 실제 세계의 인과 관계를 이해하도록 유도하는 특수 보상 체계를 설계했습니다. 또한 인간의 운전 행동과 더 잘 일치시키기 위해, RL 훈련에 IL을 정규화 항으로 통합했습니다. 우리는 다양한, 이전에 접하지 못한 3DGS 환경으로 구성된 폐쇄 루프 평가 벤치마크를 소개합니다. IL 기반 방법과 비교했을 때, RAD는 대부분의 폐쇄 루프 지표에서 더 강력한 성능을 보였으며, 특히 충돌률이 3배 낮았습니다. 풍부한 폐쇄 루프 결과는 https://hgao-cv.github.io/RAD에서 확인할 수 있습니다.

English

Existing end-to-end autonomous driving (AD) algorithms typically follow the Imitation Learning (IL) paradigm, which faces challenges such as causal confusion and the open-loop gap. In this work, we establish a 3DGS-based closed-loop Reinforcement Learning (RL) training paradigm. By leveraging 3DGS techniques, we construct a photorealistic digital replica of the real physical world, enabling the AD policy to extensively explore the state space and learn to handle out-of-distribution scenarios through large-scale trial and error. To enhance safety, we design specialized rewards that guide the policy to effectively respond to safety-critical events and understand real-world causal relationships. For better alignment with human driving behavior, IL is incorporated into RL training as a regularization term. We introduce a closed-loop evaluation benchmark consisting of diverse, previously unseen 3DGS environments. Compared to IL-based methods, RAD achieves stronger performance in most closed-loop metrics, especially 3x lower collision rate. Abundant closed-loop results are presented at https://hgao-cv.github.io/RAD.

RAD: 대규모 3DGS 기반 강화 학습을 통한 종단간 주행 정책 훈련

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

초록

Summary

Support