利用自我精進的資料動力學,啟動以語言引導的導航學習

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

December 11, 2024
作者: Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, Limin Wang
cs.AI

摘要

在具身AI中,為訓練強健的語言引導代理創建高質量數據一直是一個持久的挑戰。本文介紹了一種自我精進數據飛輪(SRDF),通過兩個模型之間的協作,即指令生成器和導航器,迭代地精煉數據池來生成高質量且大規模的導航指令-軌跡對,而無需任何人為標註。具體而言,SRDF開始使用基礎生成器為基礎導航器創建初始數據池,然後應用訓練過的導航器來過濾數據池。這導致更高保真度的數據,以訓練更好的生成器,進而產生更高質量的數據,用於訓練下一輪的導航器。這樣的飛輪建立了一個數據自我精進過程,為大規模語言引導導航學習提供了持續改進和高效的數據集。我們的實驗表明,在經過幾輪飛輪後,導航器將經典R2R測試集上的性能邊界從70%提升至78% SPL,首次超越人類表現(76%)。與此同時,這個過程產生了一個優秀的生成器,通過SPICE從23.5提升至26.2,優於所有先前的VLN指令生成方法。最後,我們展示了我們方法的可擴展性,通過增加環境和指令多樣性,以及我們預訓練導航器在各種下游導航任務中的泛化能力,所有情況下均大幅超越了最先進的方法。
English
Creating high-quality data for training robust language-instructed agents is a long-lasting challenge in embodied AI. In this paper, we introduce a Self-Refining Data Flywheel (SRDF) that generates high-quality and large-scale navigational instruction-trajectory pairs by iteratively refining the data pool through the collaboration between two models, the instruction generator and the navigator, without any human-in-the-loop annotation. Specifically, SRDF starts with using a base generator to create an initial data pool for training a base navigator, followed by applying the trained navigator to filter the data pool. This leads to higher-fidelity data to train a better generator, which can, in turn, produce higher-quality data for training the next-round navigator. Such a flywheel establishes a data self-refining process, yielding a continuously improved and highly effective dataset for large-scale language-guided navigation learning. Our experiments demonstrate that after several flywheel rounds, the navigator elevates the performance boundary from 70% to 78% SPL on the classic R2R test set, surpassing human performance (76%) for the first time. Meanwhile, this process results in a superior generator, evidenced by a SPICE increase from 23.5 to 26.2, better than all previous VLN instruction generation methods. Finally, we demonstrate the scalability of our method through increasing environment and instruction diversity, and the generalization ability of our pre-trained navigator across various downstream navigation tasks, surpassing state-of-the-art methods by a large margin in all cases.

Summary

AI-Generated Summary

PDF52December 12, 2024