ChatPaper.aiChatPaper

EgoNormia:物理社交规范理解的基准测试

EgoNormia: Benchmarking Physical Social Norm Understanding

February 27, 2025
作者: MohammadHossein Rezaei, Yicheng Fu, Phil Cuvin, Caleb Ziems, Yanzhe Zhang, Hao Zhu, Diyi Yang
cs.AI

摘要

人类行为受规范制约。在现实世界中行动时,人类不仅遵循规范,还会权衡不同规范之间的取舍。然而,机器在训练过程中往往缺乏对规范理解与推理的明确指导,尤其是当这些规范植根于物理和社会情境时。为了提升并评估视觉-语言模型(VLMs)的规范性推理能力,我们提出了EgoNormia |epsilon|,该数据集包含1,853段以自我为中心的人类互动视频,每段视频均配有两个相关问题,旨在评估对规范性行为的预测与合理性解释。这些规范性行为涵盖七大类别:安全、隐私、空间距离、礼貌、合作、协调/主动性以及沟通/清晰度。为大规模构建此数据集,我们设计了一套创新流程,结合视频采样、自动答案生成、筛选及人工验证。我们的研究表明,当前最先进的视觉-语言模型在规范理解方面表现欠佳,在EgoNormia上的最高得分仅为45%(相比之下,人类基准为92%)。通过对各维度性能的分析,我们揭示了将此类模型应用于现实世界代理时,在安全、隐私以及协作与沟通能力方面的显著风险。此外,我们还展示了一种基于检索的生成方法,利用EgoNomia能够有效增强视觉-语言模型的规范性推理能力。
English
Human activity is moderated by norms. When performing actions in the real world, humans not only follow norms, but also consider the trade-off between different norms However, machines are often trained without explicit supervision on norm understanding and reasoning, especially when the norms are grounded in a physical and social context. To improve and evaluate the normative reasoning capability of vision-language models (VLMs), we present EgoNormia |epsilon|, consisting of 1,853 ego-centric videos of human interactions, each of which has two related questions evaluating both the prediction and justification of normative actions. The normative actions encompass seven categories: safety, privacy, proxemics, politeness, cooperation, coordination/proactivity, and communication/legibility. To compile this dataset at scale, we propose a novel pipeline leveraging video sampling, automatic answer generation, filtering, and human validation. Our work demonstrates that current state-of-the-art vision-language models lack robust norm understanding, scoring a maximum of 45% on EgoNormia (versus a human bench of 92%). Our analysis of performance in each dimension highlights the significant risks of safety, privacy, and the lack of collaboration and communication capability when applied to real-world agents. We additionally show that through a retrieval-based generation method, it is possible to use EgoNomia to enhance normative reasoning in VLMs.

Summary

AI-Generated Summary

PDF52March 3, 2025