RLDG：透過強化學習實現機器人通用政策蒸餾

摘要

最近在機器人基礎模型方面的進展已經使得能夠適應各種任務的通用政策的開發成為可能。儘管這些模型展現出令人印象深刻的靈活性，但它們的表現在很大程度上取決於訓練數據的質量。在這項工作中，我們提出了強化學習精煉通用政策（RLDG）的方法，利用強化學習來生成高質量的訓練數據，以微調通用政策。通過在精確操作任務（如連接器插入和組裝）上進行大量真實世界實驗，我們展示了使用RL生成的數據訓練的通用政策在一致優於使用人類示範訓練的情況下，成功率高出多達40％，同時對新任務具有更好的泛化能力。我們還提供了詳細的分析，揭示了這種性能增益來自於優化的動作分佈和改進的狀態覆蓋。我們的結果表明，將特定任務的RL與通用政策精煉相結合，提供了一種有前途的方法，用於開發更具能力和效率的機器人操作系統，既保持了基礎模型的靈活性，又實現了專業控制器的性能。視頻和代碼可在我們的項目網站https://generalist-distillation.github.io 找到。

English

Recent advances in robotic foundation models have enabled the development of generalist policies that can adapt to diverse tasks. While these models show impressive flexibility, their performance heavily depends on the quality of their training data. In this work, we propose Reinforcement Learning Distilled Generalists (RLDG), a method that leverages reinforcement learning to generate high-quality training data for finetuning generalist policies. Through extensive real-world experiments on precise manipulation tasks like connector insertion and assembly, we demonstrate that generalist policies trained with RL-generated data consistently outperform those trained with human demonstrations, achieving up to 40% higher success rates while generalizing better to new tasks. We also provide a detailed analysis that reveals this performance gain stems from both optimized action distributions and improved state coverage. Our results suggest that combining task-specific RL with generalist policy distillation offers a promising approach for developing more capable and efficient robotic manipulation systems that maintain the flexibility of foundation models while achieving the performance of specialized controllers. Videos and code can be found on our project website https://generalist-distillation.github.io

RLDG：透過強化學習實現機器人通用政策蒸餾

RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning

摘要

Support