ChatPaper.aiChatPaper

ILIAS:大规模实例级图像检索

ILIAS: Instance-Level Image retrieval At Scale

February 17, 2025
作者: Giorgos Kordopatis-Zilos, Vladan Stojnić, Anna Manko, Pavel Šuma, Nikolaos-Antonios Ypsilantis, Nikos Efthymiadis, Zakaria Laskar, Jiří Matas, Ondřej Chum, Giorgos Tolias
cs.AI

摘要

本研究介绍了ILIAS,一个专为大规模实例级图像检索设计的新型测试数据集。它旨在评估当前及未来基础模型与检索技术在识别特定物体方面的能力。相较于现有数据集,ILIAS的主要优势在于其大规模性、领域多样性、精确的真实标注,以及尚未达到饱和的性能表现。ILIAS包含了针对1,000个物体实例的查询图像和正样本图像,这些图像经过人工收集,以捕捉具有挑战性的条件和多样化的领域背景。大规模检索任务则针对来自YFCC100M的1亿张干扰图像进行。为了避免假阴性结果且无需额外标注工作,我们仅纳入确认在2014年(即YFCC100M的汇编日期)之后出现的查询物体。通过广泛的基准测试,我们得出以下观察:i) 在特定领域(如地标或商品)上微调的模型在该领域表现出色,但在ILIAS上表现欠佳;ii) 利用多领域类别监督学习线性适应层能带来性能提升,尤其是对于视觉-语言模型;iii) 在检索重排序中,局部描述符仍是关键要素,特别是在背景杂乱严重的情况下;iv) 视觉-语言基础模型在文本到图像检索上的表现,意外地接近相应的图像到图像检索情况。更多信息请访问:https://vrg.fel.cvut.cz/ilias/。
English
This work introduces ILIAS, a new test dataset for Instance-Level Image retrieval At Scale. It is designed to evaluate the ability of current and future foundation models and retrieval techniques to recognize particular objects. The key benefits over existing datasets include large scale, domain diversity, accurate ground truth, and a performance that is far from saturated. ILIAS includes query and positive images for 1,000 object instances, manually collected to capture challenging conditions and diverse domains. Large-scale retrieval is conducted against 100 million distractor images from YFCC100M. To avoid false negatives without extra annotation effort, we include only query objects confirmed to have emerged after 2014, i.e. the compilation date of YFCC100M. An extensive benchmarking is performed with the following observations: i) models fine-tuned on specific domains, such as landmarks or products, excel in that domain but fail on ILIAS ii) learning a linear adaptation layer using multi-domain class supervision results in performance improvements, especially for vision-language models iii) local descriptors in retrieval re-ranking are still a key ingredient, especially in the presence of severe background clutter iv) the text-to-image performance of the vision-language foundation models is surprisingly close to the corresponding image-to-image case. website: https://vrg.fel.cvut.cz/ilias/

Summary

AI-Generated Summary

PDF42February 18, 2025