RLDG:透過強化學習實現機器人通用政策蒸餾
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning
December 13, 2024
作者: Charles Xu, Qiyang Li, Jianlan Luo, Sergey Levine
cs.AI
摘要
最近在機器人基礎模型方面的進展已經使得能夠適應各種任務的通用政策的開發成為可能。儘管這些模型展現出令人印象深刻的靈活性,但它們的表現在很大程度上取決於訓練數據的質量。在這項工作中,我們提出了強化學習精煉通用政策(RLDG)的方法,利用強化學習來生成高質量的訓練數據,以微調通用政策。通過在精確操作任務(如連接器插入和組裝)上進行大量真實世界實驗,我們展示了使用RL生成的數據訓練的通用政策在一致優於使用人類示範訓練的情況下,成功率高出多達40%,同時對新任務具有更好的泛化能力。我們還提供了詳細的分析,揭示了這種性能增益來自於優化的動作分佈和改進的狀態覆蓋。我們的結果表明,將特定任務的RL與通用政策精煉相結合,提供了一種有前途的方法,用於開發更具能力和效率的機器人操作系統,既保持了基礎模型的靈活性,又實現了專業控制器的性能。視頻和代碼可在我們的項目網站https://generalist-distillation.github.io 找到。
English
Recent advances in robotic foundation models have enabled the development of
generalist policies that can adapt to diverse tasks. While these models show
impressive flexibility, their performance heavily depends on the quality of
their training data. In this work, we propose Reinforcement Learning Distilled
Generalists (RLDG), a method that leverages reinforcement learning to generate
high-quality training data for finetuning generalist policies. Through
extensive real-world experiments on precise manipulation tasks like connector
insertion and assembly, we demonstrate that generalist policies trained with
RL-generated data consistently outperform those trained with human
demonstrations, achieving up to 40% higher success rates while generalizing
better to new tasks. We also provide a detailed analysis that reveals this
performance gain stems from both optimized action distributions and improved
state coverage. Our results suggest that combining task-specific RL with
generalist policy distillation offers a promising approach for developing more
capable and efficient robotic manipulation systems that maintain the
flexibility of foundation models while achieving the performance of specialized
controllers. Videos and code can be found on our project website
https://generalist-distillation.github.ioSummary
AI-Generated Summary