ChatPaper.aiChatPaper

機器人觀察機器人學:使用單眼4D重建模仿關節物體操作

Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction

September 26, 2024
作者: Justin Kerr, Chung Min Kim, Mingxuan Wu, Brent Yi, Qianqian Wang, Ken Goldberg, Angjoo Kanazawa
cs.AI

摘要

人類可以通過觀察他人來學習操作新物體;讓機器人具備從這些示範中學習的能力將使其能夠自然地指定新行為的界面。本研究開發了機器人看、機器人做(RSRD),這是一種從單眼RGB人類示範中模仿關節式物體操作的方法,給定單個靜態多視角物體掃描。我們首先提出了4D可微分部件模型(4D-DPM),這是一種從單眼視頻中恢復3D部件運動的方法,具有可微分渲染。這種分析合成方法使用部件中心的特徵場進行迭代優化,從而能夠使用幾何正則化器僅從單個視頻中恢復3D運動。給定這種4D重建,機器人通過規劃雙手臂運動來複製物體軌跡,從而引發示範的物體部件運動。通過將示範表示為部件中心軌跡,RSRD專注於複製示範的預期行為,同時考慮機器人自身的形態限制,而不是試圖重現手部運動。我們在地面真實標註的3D部件軌跡上評估了4D-DPM的3D跟踪準確性,並在雙手臂YuMi機器人上對9個物體進行了每個10次試驗的RSRD物理執行性能評估。RSRD的每個階段實現了平均87%的成功率,總體端到端成功率為60%,共進行了90次試驗。值得注意的是,這是僅使用從大型預訓練視覺模型中提煉出的特徵場實現的,而無需任何任務特定的訓練、微調、數據集收集或標註。項目頁面:https://robot-see-robot-do.github.io
English
Humans can learn to manipulate new objects by simply watching others; providing robots with the ability to learn from such demonstrations would enable a natural interface specifying new behaviors. This work develops Robot See Robot Do (RSRD), a method for imitating articulated object manipulation from a single monocular RGB human demonstration given a single static multi-view object scan. We first propose 4D Differentiable Part Models (4D-DPM), a method for recovering 3D part motion from a monocular video with differentiable rendering. This analysis-by-synthesis approach uses part-centric feature fields in an iterative optimization which enables the use of geometric regularizers to recover 3D motions from only a single video. Given this 4D reconstruction, the robot replicates object trajectories by planning bimanual arm motions that induce the demonstrated object part motion. By representing demonstrations as part-centric trajectories, RSRD focuses on replicating the demonstration's intended behavior while considering the robot's own morphological limits, rather than attempting to reproduce the hand's motion. We evaluate 4D-DPM's 3D tracking accuracy on ground truth annotated 3D part trajectories and RSRD's physical execution performance on 9 objects across 10 trials each on a bimanual YuMi robot. Each phase of RSRD achieves an average of 87% success rate, for a total end-to-end success rate of 60% across 90 trials. Notably, this is accomplished using only feature fields distilled from large pretrained vision models -- without any task-specific training, fine-tuning, dataset collection, or annotation. Project page: https://robot-see-robot-do.github.io

Summary

AI-Generated Summary

PDF92November 16, 2024