通过人在回路中强化学习实现精准灵巧的机器人操作

摘要

强化学习（RL）在实现复杂机器人操作技能的自主获取方面具有巨大潜力，但在现实世界中实现这一潜力一直是具有挑战性的。我们提出了一种人在环中基于视觉的强化学习系统，展示了在各种灵巧操作任务上的出色表现，包括动态操作、精密组装和双臂协调。我们的方法整合了演示和人类纠正、高效的RL算法以及其他系统级设计选择，学习出能够在仅1至2.5小时的训练内实现几乎完美成功率和快速循环时间的策略。我们展示了我们的方法明显优于模仿学习基线和先前的RL方法，成功率平均提高了2倍，执行速度提高了1.8倍。通过大量实验和分析，我们提供了关于我们方法有效性的见解，展示了它如何学习出适用于反应性和预测性控制策略的稳健、自适应策略。我们的结果表明，RL确实可以在实际训练时间内直接在现实世界中学习各种复杂基于视觉的操作策略。我们希望这项工作能激发新一代学习型机器人操作技术，造福工业应用和研究进展。视频和代码可在我们的项目网站https://hil-serl.github.io/ 上找到。

English

Reinforcement learning (RL) holds great promise for enabling autonomous acquisition of complex robotic manipulation skills, but realizing this potential in real-world settings has been challenging. We present a human-in-the-loop vision-based RL system that demonstrates impressive performance on a diverse set of dexterous manipulation tasks, including dynamic manipulation, precision assembly, and dual-arm coordination. Our approach integrates demonstrations and human corrections, efficient RL algorithms, and other system-level design choices to learn policies that achieve near-perfect success rates and fast cycle times within just 1 to 2.5 hours of training. We show that our method significantly outperforms imitation learning baselines and prior RL approaches, with an average 2x improvement in success rate and 1.8x faster execution. Through extensive experiments and analysis, we provide insights into the effectiveness of our approach, demonstrating how it learns robust, adaptive policies for both reactive and predictive control strategies. Our results suggest that RL can indeed learn a wide range of complex vision-based manipulation policies directly in the real world within practical training times. We hope this work will inspire a new generation of learned robotic manipulation techniques, benefiting both industrial applications and research advancements. Videos and code are available at our project website https://hil-serl.github.io/.

通过人在回路中强化学习实现精准灵巧的机器人操作

Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning

摘要

Summary

Support

Support