RLDG:通过强化学习实现的机器人通用策略蒸馏

RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning

December 13, 2024
作者: Charles Xu, Qiyang Li, Jianlan Luo, Sergey Levine
cs.AI

摘要

最近在机器人基础模型方面取得的进展使得能够开发能够适应多样任务的通用策略成为可能。虽然这些模型展现出了令人印象深刻的灵活性,但它们的性能在很大程度上取决于训练数据的质量。在这项工作中,我们提出了强化学习精炼通用策略(RLDG)的方法,利用强化学习来生成用于微调通用策略的高质量训练数据。通过在精确操作任务(如连接器插入和装配)上进行大量真实世界实验,我们展示了使用RL生成数据训练的通用策略在一致优于使用人类演示训练的情况下,成功率高出多达40%,同时更好地推广到新任务。我们还提供了详细分析,揭示了这种性能提升源自优化的动作分布和改进的状态覆盖。我们的结果表明,将特定任务的强化学习与通用策略精炼相结合,为开发更具能力和效率的机器人操作系统提供了一种有前途的方法,既保持了基础模型的灵活性,又实现了专门控制器的性能。视频和代码可在我们的项目网站https://generalist-distillation.github.io 上找到。
English
Recent advances in robotic foundation models have enabled the development of generalist policies that can adapt to diverse tasks. While these models show impressive flexibility, their performance heavily depends on the quality of their training data. In this work, we propose Reinforcement Learning Distilled Generalists (RLDG), a method that leverages reinforcement learning to generate high-quality training data for finetuning generalist policies. Through extensive real-world experiments on precise manipulation tasks like connector insertion and assembly, we demonstrate that generalist policies trained with RL-generated data consistently outperform those trained with human demonstrations, achieving up to 40% higher success rates while generalizing better to new tasks. We also provide a detailed analysis that reveals this performance gain stems from both optimized action distributions and improved state coverage. Our results suggest that combining task-specific RL with generalist policy distillation offers a promising approach for developing more capable and efficient robotic manipulation systems that maintain the flexibility of foundation models while achieving the performance of specialized controllers. Videos and code can be found on our project website https://generalist-distillation.github.io

Summary

AI-Generated Summary

PDF12December 18, 2024