RePOPE:註解錯誤對POPE基準的影響
RePOPE: Impact of Annotation Errors on the POPE Benchmark
April 22, 2025
作者: Yannic Neuhaus, Matthias Hein
cs.AI
摘要
由於數據標註成本高昂,基準數據集通常會採用現有圖像數據集中的標籤。在本研究中,我們評估了MSCOCO數據集中標籤錯誤對常用對象幻覺基準POPE的影響。我們重新標註了基準圖像,並發現不同子集間標註錯誤存在不平衡現象。在我們稱之為RePOPE的修正標籤上對多個模型進行評估後,我們觀察到模型排名出現顯著變化,這凸顯了標籤質量的影響。代碼和數據可在https://github.com/YanNeu/RePOPE 獲取。
English
Since data annotation is costly, benchmark datasets often incorporate labels
from established image datasets. In this work, we assess the impact of label
errors in MSCOCO on the frequently used object hallucination benchmark POPE. We
re-annotate the benchmark images and identify an imbalance in annotation errors
across different subsets. Evaluating multiple models on the revised labels,
which we denote as RePOPE, we observe notable shifts in model rankings,
highlighting the impact of label quality. Code and data are available at
https://github.com/YanNeu/RePOPE .Summary
AI-Generated Summary