提及任何人
Referring to Any Person
March 11, 2025
作者: Qing Jiang, Lin Wu, Zhaoyang Zeng, Tianhe Ren, Yuda Xiong, Yihao Chen, Qin Liu, Lei Zhang
cs.AI
摘要
在计算机视觉领域,人类无疑是最重要的参与者,而根据自然语言描述检测特定个体的能力——我们将其定义为“指向任意人物”的任务——具有重要的实用价值。然而,我们发现现有模型普遍难以实现实际应用中的可用性,且当前基准测试因局限于一对一的指向关系而阻碍了这一领域的进展。在本研究中,我们从三个关键视角重新审视这一任务:任务定义、数据集设计和模型架构。首先,我们明确了可指向实体的五个方面及该任务的三个显著特征。接着,我们引入了HumanRef,这是一个旨在应对这些挑战并更好地反映现实世界应用场景的新颖数据集。从模型设计角度出发,我们将多模态大语言模型与目标检测框架相结合,构建了一个名为RexSeek的稳健指向模型。实验结果表明,在RefCOCO/+/g等常用基准测试上表现优异的现有模型,由于无法检测多个个体,在HumanRef上表现欠佳。相比之下,RexSeek不仅在人物指向任务中表现出色,还能有效泛化至常见物体的指向任务,使其广泛适用于多种感知任务。代码已发布于https://github.com/IDEA-Research/RexSeek。
English
Humans are undoubtedly the most important participants in computer vision,
and the ability to detect any individual given a natural language description,
a task we define as referring to any person, holds substantial practical value.
However, we find that existing models generally fail to achieve real-world
usability, and current benchmarks are limited by their focus on one-to-one
referring, that hinder progress in this area. In this work, we revisit this
task from three critical perspectives: task definition, dataset design, and
model architecture. We first identify five aspects of referable entities and
three distinctive characteristics of this task. Next, we introduce HumanRef, a
novel dataset designed to tackle these challenges and better reflect real-world
applications. From a model design perspective, we integrate a multimodal large
language model with an object detection framework, constructing a robust
referring model named RexSeek. Experimental results reveal that
state-of-the-art models, which perform well on commonly used benchmarks like
RefCOCO/+/g, struggle with HumanRef due to their inability to detect multiple
individuals. In contrast, RexSeek not only excels in human referring but also
generalizes effectively to common object referring, making it broadly
applicable across various perception tasks. Code is available at
https://github.com/IDEA-Research/RexSeekSummary
AI-Generated Summary