將語言基礎建立在多角度指涉性溝通中
Grounding Language in Multi-Perspective Referential Communication
October 4, 2024
作者: Zineng Tang, Lingjun Mao, Alane Suhr
cs.AI
摘要
我們介紹了一個在多智能體具體環境中進行指涉表達生成和理解的任務和數據集。在這個任務中,共享場景中的兩個智能體必須考慮彼此的視覺角度,這可能與它們自己的視角不同,以便產生和理解對場景中物體以及它們之間空間關係的指涉。我們收集了一個包含2,970個人類編寫的指涉表達的數據集,每個表達都與人類理解判斷配對,並評估了自動模型作為說話者和聽眾與人類夥伴配對的表現,發現模型在指涉生成和理解方面的表現都落後於人類智能體的配對。最後,我們實驗了訓練一個開放權重的說話者模型,當與一個聽眾配對並表現出溝通成功的證據時,導致溝通成功率從58.9%提高到69.3%,甚至超越了最強的專有模型。
English
We introduce a task and dataset for referring expression generation and
comprehension in multi-agent embodied environments. In this task, two agents in
a shared scene must take into account one another's visual perspective, which
may be different from their own, to both produce and understand references to
objects in a scene and the spatial relations between them. We collect a dataset
of 2,970 human-written referring expressions, each paired with human
comprehension judgments, and evaluate the performance of automated models as
speakers and listeners paired with human partners, finding that model
performance in both reference generation and comprehension lags behind that of
pairs of human agents. Finally, we experiment training an open-weight speaker
model with evidence of communicative success when paired with a listener,
resulting in an improvement from 58.9 to 69.3% in communicative success and
even outperforming the strongest proprietary model.Summary
AI-Generated Summary