ChatPaper.aiChatPaper

IGOR:圖像目標表示是具體控制單元,用於具身人工智慧中的基礎模型。

IGOR: Image-GOal Representations are the Atomic Control Units for Foundation Models in Embodied AI

October 17, 2024
作者: Xiaoyu Chen, Junliang Guo, Tianyu He, Chuheng Zhang, Pushi Zhang, Derek Cathera Yang, Li Zhao, Jiang Bian
cs.AI

摘要

我們介紹了圖像目標表示(IGOR),旨在學習一個統一的、在人類和各種機器人之間具有語義一致的行動空間。通過這個統一的潛在行動空間,IGOR實現了在大規模機器人和人類活動數據之間的知識轉移。我們通過將初始圖像與目標狀態之間的視覺變化壓縮為潛在行動,來實現這一點。IGOR使我們能夠為互聯網規模的視頻數據生成潛在行動標籤。這個統一的潛在行動空間使得能夠在各種由機器人和人類執行的任務中訓練基礎策略和世界模型。我們證明:(1)IGOR學習了一個對人類和機器人都具有語義一致的行動空間,描述了代表物理交互知識的各種可能運動;(2)IGOR可以通過共同使用潛在行動模型和世界模型,“遷移”一個視頻中物體的運動到其他視頻,甚至跨越人類和機器人之間;(3)IGOR可以通過基礎策略模型學習將潛在行動與自然語言對齊,並將潛在行動與低級策略模型結合,實現有效的機器人控制。我們相信IGOR為人類到機器人的知識轉移和控制開啟了新的可能性。
English
We introduce Image-GOal Representations (IGOR), aiming to learn a unified, semantically consistent action space across human and various robots. Through this unified latent action space, IGOR enables knowledge transfer among large-scale robot and human activity data. We achieve this by compressing visual changes between an initial image and its goal state into latent actions. IGOR allows us to generate latent action labels for internet-scale video data. This unified latent action space enables the training of foundation policy and world models across a wide variety of tasks performed by both robots and humans. We demonstrate that: (1) IGOR learns a semantically consistent action space for both human and robots, characterizing various possible motions of objects representing the physical interaction knowledge; (2) IGOR can "migrate" the movements of the object in the one video to other videos, even across human and robots, by jointly using the latent action model and world model; (3) IGOR can learn to align latent actions with natural language through the foundation policy model, and integrate latent actions with a low-level policy model to achieve effective robot control. We believe IGOR opens new possibilities for human-to-robot knowledge transfer and control.

Summary

AI-Generated Summary

PDF82November 13, 2024