物件偵測表現與視覺显著性和深度估計的相關性

摘要

隨著物體偵測技術不斷演進，了解其與相關視覺任務的關係對於優化模型架構和計算資源至關重要。本文探討了物體偵測準確性與兩個基本視覺任務之間的相關性：深度預測和視覺显著性預測。通過在 COCO 和 Pascal VOC 資料集上使用最先進的模型（DeepGaze IIE、Depth Anything、DPT-Large 和 Itti's 模型）進行全面實驗，我們發現視覺显著性與物體偵測準確性之間呈現一致較強的相關性（在 Pascal VOC 上的 mArho 高達 0.459），相較之下深度預測的相關性較低（mArho 最高達 0.283）。我們的分析揭示了在不同物體類別之間這些相關性存在顯著變化，較大的物體顯示出高達三倍於較小物體的相關值。這些發現表明將視覺显著性特徵納入物體偵測架構可能比深度信息更有益，尤其是對於特定物體類別。觀察到的類別特定變化也為針對性特徵工程和資料集設計改進提供了見解，潛在地導致更高效和準確的物體偵測系統。

English

As object detection techniques continue to evolve, understanding their relationships with complementary visual tasks becomes crucial for optimising model architectures and computational resources. This paper investigates the correlations between object detection accuracy and two fundamental visual tasks: depth prediction and visual saliency prediction. Through comprehensive experiments using state-of-the-art models (DeepGaze IIE, Depth Anything, DPT-Large, and Itti's model) on COCO and Pascal VOC datasets, we find that visual saliency shows consistently stronger correlations with object detection accuracy (mArho up to 0.459 on Pascal VOC) compared to depth prediction (mArho up to 0.283). Our analysis reveals significant variations in these correlations across object categories, with larger objects showing correlation values up to three times higher than smaller objects. These findings suggest incorporating visual saliency features into object detection architectures could be more beneficial than depth information, particularly for specific object categories. The observed category-specific variations also provide insights for targeted feature engineering and dataset design improvements, potentially leading to more efficient and accurate object detection systems.

物件偵測表現與視覺显著性和深度估計的相關性

Correlation of Object Detection Performance with Visual Saliency and Depth Estimation

摘要

Summary

Support

Support