ChatPaper.aiChatPaper

機器人操作的模仿學習中的數據縮放定律

Data Scaling Laws in Imitation Learning for Robotic Manipulation

October 24, 2024
作者: Fanqi Lin, Yingdong Hu, Pingyue Sheng, Chuan Wen, Jiacheng You, Yang Gao
cs.AI

摘要

數據縮放已經在自然語言處理和計算機視覺等領域引起了革命,為模型提供了卓越的泛化能力。本文探討了在機器人技術中,特別是在機器人操作中是否存在類似的數據縮放規律,以及適當的數據縮放是否能夠使單任務機器人策略能夠在相同類別的任何環境中零-shot部署於任何物體上。為此,我們對模仿學習中的數據縮放進行了全面的實證研究。通過在眾多環境和物體中收集數據,我們研究了策略的泛化性能如何隨著訓練環境、物體和示範數量的增加而變化。在我們的研究中,我們收集了超過40,000個示範,並在嚴格的評估協議下執行了超過15,000次真實世界機器人執行。我們的研究結果揭示了一些有趣的結果:策略的泛化性能與環境和物體的數量之間大致呈冪律關係。環境和物體的多樣性遠比示範的絕對數量更重要;一旦每個環境或物體的示範數量達到一定閾值,額外的示範對其影響微乎其微。基於這些見解,我們提出了一種高效的數據收集策略。通過四名數據收集者在一個下午的工作,我們收集了足夠的數據,使得兩個任務的策略在新環境中以及看不見的物體上實現約90%的成功率。
English
Data scaling has revolutionized fields like natural language processing and computer vision, providing models with remarkable generalization capabilities. In this paper, we investigate whether similar data scaling laws exist in robotics, particularly in robotic manipulation, and whether appropriate data scaling can yield single-task robot policies that can be deployed zero-shot for any object within the same category in any environment. To this end, we conduct a comprehensive empirical study on data scaling in imitation learning. By collecting data across numerous environments and objects, we study how a policy's generalization performance changes with the number of training environments, objects, and demonstrations. Throughout our research, we collect over 40,000 demonstrations and execute more than 15,000 real-world robot rollouts under a rigorous evaluation protocol. Our findings reveal several intriguing results: the generalization performance of the policy follows a roughly power-law relationship with the number of environments and objects. The diversity of environments and objects is far more important than the absolute number of demonstrations; once the number of demonstrations per environment or object reaches a certain threshold, additional demonstrations have minimal effect. Based on these insights, we propose an efficient data collection strategy. With four data collectors working for one afternoon, we collect sufficient data to enable the policies for two tasks to achieve approximately 90% success rates in novel environments with unseen objects.

Summary

AI-Generated Summary

PDF62November 16, 2024