有用的狗狗機器人:利用四足機器人和視覺語言模型進行開放世界物體搜尋
Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models
September 30, 2024
作者: Qi Wu, Zipeng Fu, Xuxin Cheng, Xiaolong Wang, Chelsea Finn
cs.AI
摘要
基於學習的方法已經在四足動物的運動中取得了強大的表現。然而,有幾個挑戰阻礙了四足動物學習室內需要與環境和人類互動的有用技能:缺乏用於操作的末端效應器、僅使用模擬數據的有限語義理解,以及在室內環境中的低可穿透性和可達性。我們提出了一個用於室內四足移動操作的系統。它使用前置夾爪進行物體操作,通過在模擬中使用自我中心深度訓練的低層控制器實現敏捷技能,例如攀爬和全身傾斜,以及使用預先訓練的視覺語言模型(VLMs)配備第三人稱魚眼攝像頭和自我中心RGB攝像頭進行語義理解和命令生成。我們在兩個未知環境中評估了我們的系統,而無需進行任何真實世界的數據收集或訓練。我們的系統可以零樣本泛化到這些環境並完成任務,例如在爬過一張大床後按照用戶的命令找到一個隨機放置的玩具,成功率為60%。項目網站:https://helpful-doggybot.github.io/
English
Learning-based methods have achieved strong performance for quadrupedal
locomotion. However, several challenges prevent quadrupeds from learning
helpful indoor skills that require interaction with environments and humans:
lack of end-effectors for manipulation, limited semantic understanding using
only simulation data, and low traversability and reachability in indoor
environments. We present a system for quadrupedal mobile manipulation in indoor
environments. It uses a front-mounted gripper for object manipulation, a
low-level controller trained in simulation using egocentric depth for agile
skills like climbing and whole-body tilting, and pre-trained vision-language
models (VLMs) with a third-person fisheye and an egocentric RGB camera for
semantic understanding and command generation. We evaluate our system in two
unseen environments without any real-world data collection or training. Our
system can zero-shot generalize to these environments and complete tasks, like
following user's commands to fetch a randomly placed stuff toy after climbing
over a queen-sized bed, with a 60% success rate. Project website:
https://helpful-doggybot.github.io/Summary
AI-Generated Summary