將語言基礎建立在多角度指涉性溝通中

摘要

我們介紹了一個在多智能體具體環境中進行指涉表達生成和理解的任務和數據集。在這個任務中，共享場景中的兩個智能體必須考慮彼此的視覺角度，這可能與它們自己的視角不同，以便產生和理解對場景中物體以及它們之間空間關係的指涉。我們收集了一個包含2,970個人類編寫的指涉表達的數據集，每個表達都與人類理解判斷配對，並評估了自動模型作為說話者和聽眾與人類夥伴配對的表現，發現模型在指涉生成和理解方面的表現都落後於人類智能體的配對。最後，我們實驗了訓練一個開放權重的說話者模型，當與一個聽眾配對並表現出溝通成功的證據時，導致溝通成功率從58.9%提高到69.3%，甚至超越了最強的專有模型。

English

We introduce a task and dataset for referring expression generation and comprehension in multi-agent embodied environments. In this task, two agents in a shared scene must take into account one another's visual perspective, which may be different from their own, to both produce and understand references to objects in a scene and the spatial relations between them. We collect a dataset of 2,970 human-written referring expressions, each paired with human comprehension judgments, and evaluate the performance of automated models as speakers and listeners paired with human partners, finding that model performance in both reference generation and comprehension lags behind that of pairs of human agents. Finally, we experiment training an open-weight speaker model with evidence of communicative success when paired with a listener, resulting in an improvement from 58.9 to 69.3% in communicative success and even outperforming the strongest proprietary model.

將語言基礎建立在多角度指涉性溝通中

Grounding Language in Multi-Perspective Referential Communication

摘要

Summary

Support

Support