MB-ORES:面向遥感视觉定位的多分支物体推理器
MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing
March 31, 2025
作者: Karim Radouane, Hanane Azzag, Mustapha lebbah
cs.AI
摘要
我们提出了一种统一框架,将目标检测(OD)与视觉定位(VG)技术整合应用于遥感(RS)影像处理。为了支持常规的目标检测并为视觉定位任务建立直观先验,我们利用指代表达数据对开放集目标检测器进行微调,将其视为一种部分监督的目标检测任务。在第一阶段,我们构建每幅图像的图表示,包含对象查询、类别嵌入及候选框位置。随后,我们的任务感知架构处理此图以执行视觉定位任务。该模型由两部分组成:(i)一个多分支网络,整合空间、视觉及类别特征以生成任务感知的候选框;(ii)一个对象推理网络,为各候选框分配概率,并通过软选择机制最终确定所指对象的定位。我们的模型在OPT-RSVG和DIOR-RSVG数据集上展现了卓越性能,相较于现有最先进方法取得了显著提升,同时保持了经典目标检测的能力。代码将发布于我们的资源库:https://github.com/rd20karim/MB-ORES。
English
We propose a unified framework that integrates object detection (OD) and
visual grounding (VG) for remote sensing (RS) imagery. To support conventional
OD and establish an intuitive prior for VG task, we fine-tune an open-set
object detector using referring expression data, framing it as a partially
supervised OD task. In the first stage, we construct a graph representation of
each image, comprising object queries, class embeddings, and proposal
locations. Then, our task-aware architecture processes this graph to perform
the VG task. The model consists of: (i) a multi-branch network that integrates
spatial, visual, and categorical features to generate task-aware proposals, and
(ii) an object reasoning network that assigns probabilities across proposals,
followed by a soft selection mechanism for final referring object localization.
Our model demonstrates superior performance on the OPT-RSVG and DIOR-RSVG
datasets, achieving significant improvements over state-of-the-art methods
while retaining classical OD capabilities. The code will be available in our
repository: https://github.com/rd20karim/MB-ORES.Summary
AI-Generated Summary