使用自我完善数据飞轮引导语言导航学习的引导原则

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

December 11, 2024
作者: Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, Limin Wang
cs.AI

摘要

在体验式人工智能中,为训练稳健的语言引导代理生成高质量数据一直是一个长期挑战。本文介绍了一种自我优化数据飞轮(SRDF),通过两个模型——指令生成器和导航器之间的协作,迭代地优化数据池,生成高质量且大规模的导航指令-轨迹对,而无需人为干预。具体而言,SRDF首先使用基础生成器创建初始数据池,用于训练基础导航器,然后将训练过的导航器应用于筛选数据池。这样产生了更高保真度的数据,用于训练更好的生成器,进而生成更高质量的数据,用于训练下一轮的导航器。这种飞轮建立了一个数据自我优化的过程,为大规模语言引导导航学习提供了持续改进且高效的数据集。我们的实验表明,在经过数轮飞轮迭代后,导航器在经典的R2R测试集上将性能边界从70%提升至78%的SPL,首次超越了人类表现(76%)。同时,这一过程产生了一个优越的生成器,其SPICE值从23.5提升至26.2,优于所有先前的VLN指令生成方法。最后,我们通过增加环境和指令多样性展示了我们方法的可扩展性,以及我们预训练导航器在各种下游导航任务中的泛化能力,各方面均大幅超越了现有方法。
English
Creating high-quality data for training robust language-instructed agents is a long-lasting challenge in embodied AI. In this paper, we introduce a Self-Refining Data Flywheel (SRDF) that generates high-quality and large-scale navigational instruction-trajectory pairs by iteratively refining the data pool through the collaboration between two models, the instruction generator and the navigator, without any human-in-the-loop annotation. Specifically, SRDF starts with using a base generator to create an initial data pool for training a base navigator, followed by applying the trained navigator to filter the data pool. This leads to higher-fidelity data to train a better generator, which can, in turn, produce higher-quality data for training the next-round navigator. Such a flywheel establishes a data self-refining process, yielding a continuously improved and highly effective dataset for large-scale language-guided navigation learning. Our experiments demonstrate that after several flywheel rounds, the navigator elevates the performance boundary from 70% to 78% SPL on the classic R2R test set, surpassing human performance (76%) for the first time. Meanwhile, this process results in a superior generator, evidenced by a SPICE increase from 23.5 to 26.2, better than all previous VLN instruction generation methods. Finally, we demonstrate the scalability of our method through increasing environment and instruction diversity, and the generalization ability of our pre-trained navigator across various downstream navigation tasks, surpassing state-of-the-art methods by a large margin in all cases.

Summary

AI-Generated Summary

PDF52December 12, 2024