RACER：富語言引導的失敗恢復政策，用於模仿學習

摘要

開發穩健且可糾正的視覺運動策略，用於機器人操作是具有挑戰性的，因為缺乏從失敗中自我恢復的機制，以及簡單語言指令在引導機器人行動方面的限制。為了應對這些問題，我們提出了一個可擴展的數據生成流程，該流程會自動將專家示範與失敗恢復軌跡和精細的語言標註進行擴充，以供訓練使用。然後，我們介紹了富語言引導失敗恢復（RACER），這是一個監督者-演員框架，將失敗恢復數據與豐富的語言描述相結合，以增強機器人控制。RACER具有一個視覺語言模型（VLM），作為在線監督者，提供詳細的語言指導以進行錯誤糾正和任務執行，以及一個以語言為條件的視覺運動策略作為演員，來預測下一步動作。我們的實驗結果表明，RACER在RLbench上各種評估設置中均優於最先進的機器人視圖轉換器（RVT），包括標準長視野任務、動態目標更改任務和零樣本未見任務，在模擬和真實世界環境中均實現了卓越的性能。視頻和代碼可在以下網址獲得：https://rich-language-failure-recovery.github.io。

English

Developing robust and correctable visuomotor policies for robotic manipulation is challenging due to the lack of self-recovery mechanisms from failures and the limitations of simple language instructions in guiding robot actions. To address these issues, we propose a scalable data generation pipeline that automatically augments expert demonstrations with failure recovery trajectories and fine-grained language annotations for training. We then introduce Rich languAge-guided failure reCovERy (RACER), a supervisor-actor framework, which combines failure recovery data with rich language descriptions to enhance robot control. RACER features a vision-language model (VLM) that acts as an online supervisor, providing detailed language guidance for error correction and task execution, and a language-conditioned visuomotor policy as an actor to predict the next actions. Our experimental results show that RACER outperforms the state-of-the-art Robotic View Transformer (RVT) on RLbench across various evaluation settings, including standard long-horizon tasks, dynamic goal-change tasks and zero-shot unseen tasks, achieving superior performance in both simulated and real world environments. Videos and code are available at: https://rich-language-failure-recovery.github.io.

RACER：富語言引導的失敗恢復政策，用於模仿學習

RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning

摘要

Summary

Support

Support