INT:针对任务通用的可提示分割的特定实例负采样
INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation
January 30, 2025
作者: Jian Hu, Zixu Cheng, Shaogang Gong
cs.AI
摘要
通用任务提示图像分割旨在通过仅利用一个通用任务提示,在单个任务描述下实现多样样本的分割。当前方法利用视觉-语言模型(VLMs)的泛化能力,从这些通用任务提示中推断出特定实例的提示,以引导分割过程。然而,当VLMs难以泛化到某些图像实例时,预测特定实例的提示效果不佳。为解决这一问题,我们引入了适用于通用任务提示分割的实例特定负采样(INT)。INT的关键思想是在生成特定实例的提示时,自适应地减少无关(负面)先验知识的影响,同时增加通过带有更高对比度的负采样选择的最合理先验知识的使用,以优化特定实例的提示生成。具体而言,INT包括两个组成部分:(1)特定实例提示生成,逐渐过滤提示生成中的错误信息;(2)语义掩模生成,确保每个图像实例的分割正确匹配特定实例提示的语义。INT在六个数据集上进行验证,包括伪装对象和医学图像,展示了其有效性、稳健性和可扩展性。
English
Task-generic promptable image segmentation aims to achieve segmentation of
diverse samples under a single task description by utilizing only one
task-generic prompt. Current methods leverage the generalization capabilities
of Vision-Language Models (VLMs) to infer instance-specific prompts from these
task-generic prompts in order to guide the segmentation process. However, when
VLMs struggle to generalise to some image instances, predicting
instance-specific prompts becomes poor. To solve this problem, we introduce
Instance-specific Negative Mining for Task-Generic
Promptable Segmentation (INT). The key idea of INT is to adaptively
reduce the influence of irrelevant (negative) prior knowledge whilst to
increase the use the most plausible prior knowledge, selected by negative
mining with higher contrast, in order to optimise instance-specific prompts
generation. Specifically, INT consists of two components: (1) instance-specific
prompt generation, which progressively fliters out incorrect information in
prompt generation; (2) semantic mask generation, which ensures each image
instance segmentation matches correctly the semantics of the instance-specific
prompts. INT is validated on six datasets, including camouflaged objects and
medical images, demonstrating its effectiveness, robustness and scalability.Summary
AI-Generated Summary