ChatPaper.aiChatPaper

機器人預訓練機器人:來自大規模機器人數據集的操控中心機器人表徵

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset

October 29, 2024
作者: Guangqi Jiang, Yifei Sun, Tao Huang, Huanyu Li, Yongyuan Liang, Huazhe Xu
cs.AI

摘要

視覺表示的預訓練已增強機器人學習的效率。由於缺乏大規模領域內的機器人數據集,先前的研究利用野外人類影片來預先訓練機器人的視覺表示。儘管它們取得了令人期待的結果,但從人類影片中獲得的表示不可避免地會受到分布變化的影響,並且缺乏對任務完成至關重要的動態信息。我們首先評估各種預先訓練的表示,以其與下游機器人操作任務(即操作中心性)的相關性。有趣的是,我們發現“操作中心性”是應用於下游任務時成功率的一個強有力指標。基於這些發現,我們提出了操作中心表示(MCR),這是一個基礎表示學習框架,捕獲視覺特徵和操作任務的動態信息,如動作和操縱感知,以提高操作中心性。具體而言,我們在DROID機器人數據集上對視覺編碼器進行預訓練,並利用與運動相關的數據,如機器人的操縱感知狀態和動作。我們引入了一種新的對比損失,將視覺觀察與機器人的操縱感知狀態-動作動態對齊,結合類似行為克隆(BC)的演員損失,在預訓練期間預測動作,以及時間對比損失。在20個任務的4個模擬領域中的實證結果證實,MCR的表現優於最強基線方法14.8%。此外,MCR通過76.9%提高了UR5e機械臂在3個現實任務上的數據高效學習性能。項目網站:https://robots-pretrain-robots.github.io/。
English
The pre-training of visual representations has enhanced the efficiency of robot learning. Due to the lack of large-scale in-domain robotic datasets, prior works utilize in-the-wild human videos to pre-train robotic visual representation. Despite their promising results, representations from human videos are inevitably subject to distribution shifts and lack the dynamics information crucial for task completion. We first evaluate various pre-trained representations in terms of their correlation to the downstream robotic manipulation tasks (i.e., manipulation centricity). Interestingly, we find that the "manipulation centricity" is a strong indicator of success rates when applied to downstream tasks. Drawing from these findings, we propose Manipulation Centric Representation (MCR), a foundation representation learning framework capturing both visual features and the dynamics information such as actions and proprioceptions of manipulation tasks to improve manipulation centricity. Specifically, we pre-train a visual encoder on the DROID robotic dataset and leverage motion-relevant data such as robot proprioceptive states and actions. We introduce a novel contrastive loss that aligns visual observations with the robot's proprioceptive state-action dynamics, combined with a behavior cloning (BC)-like actor loss to predict actions during pre-training, along with a time contrastive loss. Empirical results across 4 simulation domains with 20 tasks verify that MCR outperforms the strongest baseline method by 14.8%. Moreover, MCR boosts the performance of data-efficient learning with a UR5e arm on 3 real-world tasks by 76.9%. Project website: https://robots-pretrain-robots.github.io/.

Summary

AI-Generated Summary

PDF102November 16, 2024