로봇이 보고 로봇이 따라하기: 단안 4D 재구성을 이용한 관절화된 물체 조작 모방

초록

인간은 다른 사람을 감시함으로써 새로운 물체를 조작하는 방법을 배울 수 있습니다. 로봇에게 이러한 시연으로부터 학습할 수 있는 능력을 제공하는 것은 새로운 행동을 지정하는 자연스러운 인터페이스를 가능하게 할 것입니다. 본 연구는 로봇이 단일 단안 RGB 인간 시연으로부터 단일 정적 다중 뷰 객체 스캔을 제공받아 관절화된 물체 조작을 모방하는 방법인 Robot See Robot Do (RSRD)를 개발합니다. 우리는 먼저 4차원 미분 가능한 부품 모델(4D-DPM)을 제안합니다. 이는 미분 가능한 렌더링을 사용하여 단안 비디오에서 3D 부품 동작을 복구하는 방법입니다. 이 분석-합성 접근 방식은 기하학적 정규화자를 사용하여 단일 비디오에서 3D 동작을 복구할 수 있도록 반복적 최적화를 사용하는 부품 중심의 특징 필드를 활용합니다. 이 4D 재구성을 통해 로봇은 시연된 객체 부품 동작을 유도하는 양손 팔 동작을 계획하여 객체 궤적을 복제합니다. 시연을 부품 중심 궤적으로 표현함으로써 RSRD는 로봇 자체의 형태적 한계를 고려하면서 시연의 의도된 행동을 복제하는 데 초점을 맞춥니다. 우리는 4D-DPM의 3D 추적 정확도를 지면 실측된 3D 부품 궤적과 RSRD의 물리적 실행 성능을 양손 YuMi 로봇에서 각각 10번의 시행을 통해 9개 객체에 대해 평가합니다. RSRD의 각 단계는 90번의 시행 전체에서 60%의 최종 성공률을 달성하는 평균 87%의 성공률을 달성합니다. 높은 성공률을 달성하는 데 큰 사전 훈련된 비전 모델에서 추출된 특징 필드만 사용하여 특정 작업 훈련, 세밀한 조정, 데이터셋 수집 또는 주석 없이 이루어졌음에 주목할 만합니다. 프로젝트 페이지: https://robot-see-robot-do.github.io

English

Humans can learn to manipulate new objects by simply watching others; providing robots with the ability to learn from such demonstrations would enable a natural interface specifying new behaviors. This work develops Robot See Robot Do (RSRD), a method for imitating articulated object manipulation from a single monocular RGB human demonstration given a single static multi-view object scan. We first propose 4D Differentiable Part Models (4D-DPM), a method for recovering 3D part motion from a monocular video with differentiable rendering. This analysis-by-synthesis approach uses part-centric feature fields in an iterative optimization which enables the use of geometric regularizers to recover 3D motions from only a single video. Given this 4D reconstruction, the robot replicates object trajectories by planning bimanual arm motions that induce the demonstrated object part motion. By representing demonstrations as part-centric trajectories, RSRD focuses on replicating the demonstration's intended behavior while considering the robot's own morphological limits, rather than attempting to reproduce the hand's motion. We evaluate 4D-DPM's 3D tracking accuracy on ground truth annotated 3D part trajectories and RSRD's physical execution performance on 9 objects across 10 trials each on a bimanual YuMi robot. Each phase of RSRD achieves an average of 87% success rate, for a total end-to-end success rate of 60% across 90 trials. Notably, this is accomplished using only feature fields distilled from large pretrained vision models -- without any task-specific training, fine-tuning, dataset collection, or annotation. Project page: https://robot-see-robot-do.github.io

로봇이 보고 로봇이 따라하기: 단안 4D 재구성을 이용한 관절화된 물체 조작 모방

Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction

초록

Summary

Support

Support