AlphaDrive:通过强化学习与推理释放视觉语言模型在自动驾驶中的潜能
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
March 10, 2025
作者: Bo Jiang, Shaoyu Chen, Qian Zhang, Wenyu Liu, Xinggang Wang
cs.AI
摘要
OpenAI o1与DeepSeek R1在数学和科学等复杂领域达到甚至超越了人类专家水平,其中强化学习(RL)与推理起到了关键作用。在自动驾驶领域,近期的端到端模型虽大幅提升了规划性能,但由于常识与推理能力的局限,仍难以应对长尾问题。部分研究尝试将视觉语言模型(VLMs)融入自动驾驶,但通常仅依赖预训练模型,并在驾驶数据上进行简单的监督微调(SFT),未深入探索针对规划的训练策略或优化方法。本文提出AlphaDrive,一个专为自动驾驶设计的VLMs强化学习与推理框架。AlphaDrive引入了四种基于GRPO的强化学习奖励机制,专门针对规划任务,并采用结合SFT与RL的两阶段规划推理训练策略。结果表明,相较于仅使用SFT或未引入推理的方法,AlphaDrive显著提升了规划性能与训练效率。此外,我们欣喜地发现,经过RL训练后,AlphaDrive展现出一定的多模态规划能力,这对提升驾驶安全与效率至关重要。据我们所知,AlphaDrive是首个将基于GRPO的强化学习与规划推理整合应用于自动驾驶的研究。代码将公开发布,以促进未来研究。
English
OpenAI o1 and DeepSeek R1 achieve or even surpass human expert-level
performance in complex domains like mathematics and science, with reinforcement
learning (RL) and reasoning playing a crucial role. In autonomous driving,
recent end-to-end models have greatly improved planning performance but still
struggle with long-tailed problems due to limited common sense and reasoning
abilities. Some studies integrate vision-language models (VLMs) into autonomous
driving, but they typically rely on pre-trained models with simple supervised
fine-tuning (SFT) on driving data, without further exploration of training
strategies or optimizations specifically tailored for planning. In this paper,
we propose AlphaDrive, a RL and reasoning framework for VLMs in autonomous
driving. AlphaDrive introduces four GRPO-based RL rewards tailored for planning
and employs a two-stage planning reasoning training strategy that combines SFT
with RL. As a result, AlphaDrive significantly improves both planning
performance and training efficiency compared to using only SFT or without
reasoning. Moreover, we are also excited to discover that, following RL
training, AlphaDrive exhibits some emergent multimodal planning capabilities,
which is critical for improving driving safety and efficiency. To the best of
our knowledge, AlphaDrive is the first to integrate GRPO-based RL with planning
reasoning into autonomous driving. Code will be released to facilitate future
research.Summary
AI-Generated Summary