利用自我精進的資料動力學,啟動以語言引導的導航學習
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
December 11, 2024
作者: Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, Limin Wang
cs.AI
摘要
在具身AI中,為訓練強健的語言引導代理創建高質量數據一直是一個持久的挑戰。本文介紹了一種自我精進數據飛輪(SRDF),通過兩個模型之間的協作,即指令生成器和導航器,迭代地精煉數據池來生成高質量且大規模的導航指令-軌跡對,而無需任何人為標註。具體而言,SRDF開始使用基礎生成器為基礎導航器創建初始數據池,然後應用訓練過的導航器來過濾數據池。這導致更高保真度的數據,以訓練更好的生成器,進而產生更高質量的數據,用於訓練下一輪的導航器。這樣的飛輪建立了一個數據自我精進過程,為大規模語言引導導航學習提供了持續改進和高效的數據集。我們的實驗表明,在經過幾輪飛輪後,導航器將經典R2R測試集上的性能邊界從70%提升至78% SPL,首次超越人類表現(76%)。與此同時,這個過程產生了一個優秀的生成器,通過SPICE從23.5提升至26.2,優於所有先前的VLN指令生成方法。最後,我們展示了我們方法的可擴展性,通過增加環境和指令多樣性,以及我們預訓練導航器在各種下游導航任務中的泛化能力,所有情況下均大幅超越了最先進的方法。
English
Creating high-quality data for training robust language-instructed agents is
a long-lasting challenge in embodied AI. In this paper, we introduce a
Self-Refining Data Flywheel (SRDF) that generates high-quality and large-scale
navigational instruction-trajectory pairs by iteratively refining the data pool
through the collaboration between two models, the instruction generator and the
navigator, without any human-in-the-loop annotation. Specifically, SRDF starts
with using a base generator to create an initial data pool for training a base
navigator, followed by applying the trained navigator to filter the data pool.
This leads to higher-fidelity data to train a better generator, which can, in
turn, produce higher-quality data for training the next-round navigator. Such a
flywheel establishes a data self-refining process, yielding a continuously
improved and highly effective dataset for large-scale language-guided
navigation learning. Our experiments demonstrate that after several flywheel
rounds, the navigator elevates the performance boundary from 70% to 78% SPL on
the classic R2R test set, surpassing human performance (76%) for the first
time. Meanwhile, this process results in a superior generator, evidenced by a
SPICE increase from 23.5 to 26.2, better than all previous VLN instruction
generation methods. Finally, we demonstrate the scalability of our method
through increasing environment and instruction diversity, and the
generalization ability of our pre-trained navigator across various downstream
navigation tasks, surpassing state-of-the-art methods by a large margin in all
cases.Summary
AI-Generated Summary