ChatPaper.aiChatPaper

Robusto-1数据集:对比人类与视觉语言模型在秘鲁真实分布外自动驾驶视觉问答任务中的表现

Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru

March 10, 2025
作者: Dunant Cusipuma, David Ortega, Victor Flores-Benites, Arturo Deza
cs.AI

摘要

随着多模态基础模型开始在自动驾驶汽车中进行实验性部署,我们不禁要问:这些系统在特定驾驶情境下的反应与人类有多相似——尤其是在那些分布外的情况下?为了研究这一问题,我们创建了Robusto-1数据集,该数据集使用了秘鲁的行车记录仪视频数据。秘鲁是全球驾驶行为最为激进的国家之一,交通指数高,且街道上出现的奇异物体与非奇异物体的比例极高,这些物体很可能从未在训练中出现过。具体而言,为了初步测试基础视觉语言模型(VLMs)在驾驶认知层面与人类的对比表现,我们摒弃了边界框、分割图、占据图或轨迹估计等方法,转而采用多模态视觉问答(VQA),通过系统神经科学中广为人知的表征相似性分析(RSA)来比较人类与机器的表现。根据我们提出的问题类型以及这些系统给出的答案,我们将展示在哪些情况下VLMs与人类的表现趋同或相异,从而探究它们的认知对齐程度。我们发现,对齐程度显著取决于向每种系统(人类与VLMs)提出的问题类型,这凸显了它们在对齐上的差距。
English
As multimodal foundational models start being deployed experimentally in Self-Driving cars, a reasonable question we ask ourselves is how similar to humans do these systems respond in certain driving situations -- especially those that are out-of-distribution? To study this, we create the Robusto-1 dataset that uses dashcam video data from Peru, a country with one of the worst (aggressive) drivers in the world, a high traffic index, and a high ratio of bizarre to non-bizarre street objects likely never seen in training. In particular, to preliminarly test at a cognitive level how well Foundational Visual Language Models (VLMs) compare to Humans in Driving, we move away from bounding boxes, segmentation maps, occupancy maps or trajectory estimation to multi-modal Visual Question Answering (VQA) comparing both humans and machines through a popular method in systems neuroscience known as Representational Similarity Analysis (RSA). Depending on the type of questions we ask and the answers these systems give, we will show in what cases do VLMs and Humans converge or diverge allowing us to probe on their cognitive alignment. We find that the degree of alignment varies significantly depending on the type of questions asked to each type of system (Humans vs VLMs), highlighting a gap in their alignment.

Summary

AI-Generated Summary

PDF102March 12, 2025