ChatPaper.aiChatPaper

UniAff:一種統一的表徵,用於工具使用和與視覺語言模型的表達

UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models

September 30, 2024
作者: Qiaojun Yu, Siyuan Huang, Xibin Yuan, Zhengkai Jiang, Ce Hao, Xin Li, Haonan Chang, Junbo Wang, Liu Liu, Hongsheng Li, Peng Gao, Cewu Lu
cs.AI

摘要

過去對機器人操作的研究基於對底層三維運動約束和可利用性的有限理解。為了應對這些挑戰,我們提出了一個全面的範式,稱為UniAff,將三維物體為中心的操作和任務理解集成在統一的公式中。具體而言,我們構建了一個帶有操作相關關鍵屬性標籤的數據集,包括來自19個類別的900個可關節物體和來自12個類別的600個工具。此外,我們利用MLLMs推斷操作任務的物體為中心的表示,包括可利用性識別和對三維運動約束的推理。在模擬和現實世界環境中進行的全面實驗表明,UniAff顯著改善了對工具和可關節物體的機器人操作的泛化能力。我們希望UniAff將成為未來統一機器人操作任務的一個通用基準。圖像、視頻、數據集和代碼已發布在項目網站上:https://sites.google.com/view/uni-aff/home
English
Previous studies on robotic manipulation are based on a limited understanding of the underlying 3D motion constraints and affordances. To address these challenges, we propose a comprehensive paradigm, termed UniAff, that integrates 3D object-centric manipulation and task understanding in a unified formulation. Specifically, we constructed a dataset labeled with manipulation-related key attributes, comprising 900 articulated objects from 19 categories and 600 tools from 12 categories. Furthermore, we leverage MLLMs to infer object-centric representations for manipulation tasks, including affordance recognition and reasoning about 3D motion constraints. Comprehensive experiments in both simulation and real-world settings indicate that UniAff significantly improves the generalization of robotic manipulation for tools and articulated objects. We hope that UniAff will serve as a general baseline for unified robotic manipulation tasks in the future. Images, videos, dataset, and code are published on the project website at:https://sites.google.com/view/uni-aff/home

Summary

AI-Generated Summary

PDF154November 13, 2024