RePOPE:标注错误对POPE基准测试的影响
RePOPE: Impact of Annotation Errors on the POPE Benchmark
April 22, 2025
作者: Yannic Neuhaus, Matthias Hein
cs.AI
摘要
鉴于数据标注成本高昂,基准数据集常采用已有图像数据集的标签。本研究中,我们评估了MSCOCO数据集中的标签错误对常用目标幻觉基准POPE的影响。我们对基准图像进行了重新标注,发现不同子集间存在标注错误的不均衡现象。基于修订后的标签(我们称之为RePOPE)对多个模型进行评估,观察到模型排名出现显著变化,凸显了标签质量的重要性。代码与数据已发布于https://github.com/YanNeu/RePOPE。
English
Since data annotation is costly, benchmark datasets often incorporate labels
from established image datasets. In this work, we assess the impact of label
errors in MSCOCO on the frequently used object hallucination benchmark POPE. We
re-annotate the benchmark images and identify an imbalance in annotation errors
across different subsets. Evaluating multiple models on the revised labels,
which we denote as RePOPE, we observe notable shifts in model rankings,
highlighting the impact of label quality. Code and data are available at
https://github.com/YanNeu/RePOPE .Summary
AI-Generated Summary