通過狀態適應性專家混合體學習通用語言引導的視覺導航
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts
December 7, 2024
作者: Gengze Zhou, Yicong Hong, Zun Wang, Chongyang Zhao, Mohit Bansal, Qi Wu
cs.AI
摘要
學習指導視覺導航的學術領域可以一般性地分為高層次類別特定搜索和低層次語言導向導航,取決於語言指導的細節程度,前者強調探索過程,而後者則專注於遵循詳細的文字命令。儘管這些任務的焦點不同,但解釋指令、理解周圍環境和推斷行動決策的基本要求保持一致。本文將不同的導航任務整合到統一且通用的框架中,我們研究了在學習導航中分享通用知識和利用任務特定能力的核心困難,並提出了一種新穎的狀態自適應專家混合(SAME)模型,有效地使代理人能夠根據不同細節程度的語言和動態觀察推斷決策。憑藉SAME的支持,我們提出了一個多才多藝的代理人,能夠同時應對七個導航任務,表現優於或與特定任務代理人實現高度可比的性能。
English
The academic field of learning instruction-guided visual navigation can be
generally categorized into high-level category-specific search and low-level
language-guided navigation, depending on the granularity of language
instruction, in which the former emphasizes the exploration process, while the
latter concentrates on following detailed textual commands. Despite the
differing focuses of these tasks, the underlying requirements of interpreting
instructions, comprehending the surroundings, and inferring action decisions
remain consistent. This paper consolidates diverse navigation tasks into a
unified and generic framework -- we investigate the core difficulties of
sharing general knowledge and exploiting task-specific capabilities in learning
navigation and propose a novel State-Adaptive Mixture of Experts (SAME) model
that effectively enables an agent to infer decisions based on
different-granularity language and dynamic observations. Powered by SAME, we
present a versatile agent capable of addressing seven navigation tasks
simultaneously that outperforms or achieves highly comparable performance to
task-specific agents.Summary
AI-Generated Summary