TraceVLA:视觉追踪提示增强了通用机器人策略的时空意识。

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

December 13, 2024
作者: Ruijie Zheng, Yongyuan Liang, Shuaiyi Huang, Jianfeng Gao, Hal Daumé III, Andrey Kolobov, Furong Huang, Jianwei Yang
cs.AI

摘要

尽管在广泛的机器人数据集上预训练的大型视觉-语言-动作(VLA)模型为机器人学习提供了有前途的通用策略,但它们仍然在交互式机器人技术中的时空动态方面遇到困难,使其在处理复杂任务(如操作)时效果不佳。在这项工作中,我们引入了视觉追踪提示,这是一种简单而有效的方法,通过将状态-动作轨迹在视觉上进行编码,以促进VLA模型对动作预测的时空意识。我们通过在我们自己收集的15万个机器人操作轨迹数据集上使用视觉追踪提示对OpenVLA进行微调,开发了一种新的TraceVLA模型。在SimplerEnv的137个配置和物理WidowX机器人上的4个任务中对TraceVLA的评估表明,其表现达到了最先进水平,在SimplerEnv上比OpenVLA高出10%,在真实机器人任务上高出3.5倍,并且在不同具象和场景中表现出强大的泛化能力。为了进一步验证我们方法的有效性和普适性,我们提出了基于4B Phi-3-Vision的紧凑型VLA模型,该模型在Open-X-Embodiment上预训练,并在我们的数据集上进行微调,与7B的OpenVLA基线相媲美,同时显著提高了推理效率。
English
Although large vision-language-action (VLA) models pretrained on extensive robot datasets offer promising generalist policies for robotic learning, they still struggle with spatial-temporal dynamics in interactive robotics, making them less effective in handling complex tasks, such as manipulation. In this work, we introduce visual trace prompting, a simple yet effective approach to facilitate VLA models' spatial-temporal awareness for action prediction by encoding state-action trajectories visually. We develop a new TraceVLA model by finetuning OpenVLA on our own collected dataset of 150K robot manipulation trajectories using visual trace prompting. Evaluations of TraceVLA across 137 configurations in SimplerEnv and 4 tasks on a physical WidowX robot demonstrate state-of-the-art performance, outperforming OpenVLA by 10% on SimplerEnv and 3.5x on real-robot tasks and exhibiting robust generalization across diverse embodiments and scenarios. To further validate the effectiveness and generality of our method, we present a compact VLA model based on 4B Phi-3-Vision, pretrained on the Open-X-Embodiment and finetuned on our dataset, rivals the 7B OpenVLA baseline while significantly improving inference efficiency.

Summary

AI-Generated Summary

PDF22December 16, 2024