ChatPaper.aiChatPaper

協作式實例導航:利用代理人自我對話來最小化使用者輸入

Collaborative Instance Navigation: Leveraging Agent Self-Dialogue to Minimize User Input

December 2, 2024
作者: Francesco Taioli, Edoardo Zorzi, Gianni Franchi, Alberto Castellini, Alessandro Farinelli, Marco Cristani, Yiming Wang
cs.AI

摘要

現有的具體實例目標導航任務,是由自然語言驅動的,假設人類用戶在導航之前提供完整且細緻的實例描述,然而在現實世界中,人類的指示可能簡短且含糊不清,這可能不切實際。為了彌合這一差距,我們提出了一個新任務,即協作式實例導航(CoIN),在導航過程中通過動態的代理-人類互動來積極解決關於目標實例的不確定性,並進行自然、無模板、開放式的對話。為了應對CoIN,我們提出了一種新方法,即具有不確定性感知的代理-用戶互動(AIUTA),利用視覺語言模型(VLMs)的感知能力和大型語言模型(LLMs)的能力。首先,在對象檢測後,一個自我提問者模型啟動自我對話,以獲得完整且準確的觀察描述,同時一種新的不確定性估計技術減輕了VLM感知的不準確性。然後,一個互動觸發器模塊確定是否向用戶提問、繼續導航還是停止導航,從而最小化用戶輸入。為了評估,我們引入了CoIN-Bench,這是一個支持真實和模擬人類的基準。AIUTA在實例導航方面表現出色,與最先進的方法競爭,展示了處理用戶輸入時的極大靈活性。
English
Existing embodied instance goal navigation tasks, driven by natural language, assume human users to provide complete and nuanced instance descriptions prior to the navigation, which can be impractical in the real world as human instructions might be brief and ambiguous. To bridge this gap, we propose a new task, Collaborative Instance Navigation (CoIN), with dynamic agent-human interaction during navigation to actively resolve uncertainties about the target instance in natural, template-free, open-ended dialogues. To address CoIN, we propose a novel method, Agent-user Interaction with UncerTainty Awareness (AIUTA), leveraging the perception capability of Vision Language Models (VLMs) and the capability of Large Language Models (LLMs). First, upon object detection, a Self-Questioner model initiates a self-dialogue to obtain a complete and accurate observation description, while a novel uncertainty estimation technique mitigates inaccurate VLM perception. Then, an Interaction Trigger module determines whether to ask a question to the user, continue or halt navigation, minimizing user input. For evaluation, we introduce CoIN-Bench, a benchmark supporting both real and simulated humans. AIUTA achieves competitive performance in instance navigation against state-of-the-art methods, demonstrating great flexibility in handling user inputs.

Summary

AI-Generated Summary

PDF52December 3, 2024