GroundingSuite:复杂多粒度像素级定位能力评估
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
March 13, 2025
作者: Rui Hu, Lianghui Zhu, Yuxuan Zhang, Tianheng Cheng, Lei Liu, Heng Liu, Longjin Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang
cs.AI
摘要
像素定位任务,包括指称表达分割(RES),因其在视觉与语言模态间架设桥梁的巨大潜力而备受关注。然而,该领域的发展目前受限于现有数据集的固有缺陷,如对象类别有限、文本多样性不足以及高质量标注稀缺。为缓解这些限制,我们推出了GroundingSuite,它包含:(1)一个利用多视觉语言模型(VLM)代理的自动化数据标注框架;(2)一个大规模训练数据集,涵盖956万条多样化的指称表达及其对应的分割结果;(3)一个精心策划的评估基准,包含3,800张图像。GroundingSuite训练数据集显著提升了模型性能,使基于其训练的模型达到了最先进水平,具体表现为在gRefCOCO上取得68.9的cIoU,在RefCOCOm上获得55.3的gIoU。此外,GroundingSuite标注框架展现出相较于当前领先数据标注方法(即GLaMM)更高的效率,速度提升了4.5倍。
English
Pixel grounding, encompassing tasks such as Referring Expression Segmentation
(RES), has garnered considerable attention due to its immense potential for
bridging the gap between vision and language modalities. However, advancements
in this domain are currently constrained by limitations inherent in existing
datasets, including limited object categories, insufficient textual diversity,
and a scarcity of high-quality annotations. To mitigate these limitations, we
introduce GroundingSuite, which comprises: (1) an automated data annotation
framework leveraging multiple Vision-Language Model (VLM) agents; (2) a
large-scale training dataset encompassing 9.56 million diverse referring
expressions and their corresponding segmentations; and (3) a meticulously
curated evaluation benchmark consisting of 3,800 images. The GroundingSuite
training dataset facilitates substantial performance improvements, enabling
models trained on it to achieve state-of-the-art results. Specifically, a cIoU
of 68.9 on gRefCOCO and a gIoU of 55.3 on RefCOCOm. Moreover, the
GroundingSuite annotation framework demonstrates superior efficiency compared
to the current leading data annotation method, i.e., 4.5 times faster than
the GLaMM.Summary
AI-Generated Summary
论文概述
核心贡献
- 提出了GroundingSuite,包含自动数据标注框架、大规模训练数据集和精心策划的评估基准。
- 通过多视觉语言模型(VLM)代理实现高效的数据标注。
- 提供了包含956万条多样化指代表达及其对应分割的大规模训练数据集。
- 创建了包含3800张图像的评估基准,涵盖多种分割场景。
研究背景
- 像素级定位任务(如指代表达分割)在视觉与语言模态之间的桥梁作用受到关注。
- 现有数据集的局限性:对象类别有限、文本多样性不足、高质量标注稀缺。
关键词
- 像素级定位
- 指代表达分割
- 视觉语言模型
- 自动标注
- 大规模数据集
背景
研究空白
- 现有数据集无法满足开放词汇理解、细粒度分割和复杂场景组合的需求。
- 自动标注方法存在标注质量低和成本高的问题。
技术挑战
- 如何在复杂视觉场景中精确定位对象或区域。
- 如何生成无歧义的语言描述。
- 如何过滤噪声标注以确保数据质量。
先前方法
- GLaMM和MRES等自动标注方法存在文本歧义和高成本问题。
- RefCOCO系列数据集依赖COCO的类别限制,无法评估开放词汇或跨类别指代分割。
方法论
技术架构
- GSSculpt框架:实体空间定位、指代文本生成、噪声过滤。
- 使用SAM2生成高质量分割掩码。
- 利用多模态模型生成无歧义的指代文本。
实现细节
- 实体空间定位:生成全局描述,使用Florence-2进行短语定位,SAM2生成掩码。
- 指代文本生成:设计提示模板,使用InternVL2.5生成丰富的描述。
- 噪声过滤:使用EVF-SAM模型过滤不准确的文本-掩码对。
创新点
- 提出基于VLM的自动标注框架,显著减少标注步骤。
- 创建大规模训练数据集和评估基准,支持多样化分割场景。
结果
实验设置
- 使用SA-1B数据集进行大规模训练数据标注。
- 评估基准GSEval包含3800张图像,涵盖四种分割场景。
关键发现
- 在gRefCOCO和RefCOCOm基准上,模型性能显著提升。
- GSSculpt框架比GLaMM快4.5倍。
局限性
- 部分细粒度分割任务仍具有挑战性。
- 开放词汇理解能力有待进一步提升。
结论
- GroundingSuite为视觉语言理解领域奠定了坚实基础,支持未来研究。
- 提出的数据集和框架在多个基准上实现了最先进的性能。
1比特LLM时代:所有大型语言模型均为1.58比特。The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
1比特LLM时代:所有大型语言模型均为1.58比特。
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei•Feb 27, 2024•608142
Qwen2.5 技术报告Qwen2.5 Technical Report
Qwen2.5 技术报告
Qwen2.5 Technical Report
Qwen, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu Xia, Xingzhang Ren, Xuancheng Ren, Yang Fan, Yang Su, Yichang Zhang, Yu Wan, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zihan Qiu•Dec 19, 2024•3459
DeepSeek-R1:通过强化学习激励LLMs中的推理能力DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
DeepSeek-R1:通过强化学习激励LLMs中的推理能力
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, Jiashi Li, Jiawei Wang, Jingchang Chen, Jingyang Yuan, Junjie Qiu, Junlong Li, J. L. Cai, Jiaqi Ni, Jian Liang, Jin Chen, Kai Dong, Kai Hu, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Liang Zhao, Litong Wang, Liyue Zhang, Lei Xu, Leyi Xia, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Meng Li, Miaojun Wang, Mingming Li, Ning Tian, Panpan Huang, Peng Zhang, Qiancheng Wang, Qinyu Chen, Qiushi Du, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, R. J. Chen, R. L. Jin, Ruyi Chen, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shengfeng Ye, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, S. S. Li, Shuang Zhou, Shaoqing Wu, Shengfeng Ye, Tao Yun, Tian Pei, Tianyu Sun, T. Wang, Wangding Zeng, Wanjia Zhao, Wen Liu, Wenfeng Liang, Wenjun Gao, Wenqin Yu, Wentao Zhang, W. L. Xiao, Wei An, Xiaodong Liu, Xiaohan Wang, Xiaokang Chen, Xiaotao Nie, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, X. Q. Li, Xiangyue Jin, Xiaojin Shen, Xiaosha Chen, Xiaowen Sun, Xiaoxiang Wang, Xinnan Song, Xinyi Zhou, Xianzu Wang, Xinxia Shan, Y. K. Li, Y. Q. Wang, Y. X. Wei, Yang Zhang, Yanhong Xu, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Wang, Yi Yu, Yichao Zhang, Yifan Shi, Yiliang Xiong, Ying He, Yishi Piao, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Yuan Ou, Yuduan Wang, Yue Gong, Yuheng Zou, Yujia He, Yunfan Xiong, Yuxiang Luo, Yuxiang You, Yuxuan Liu, Yuyang Zhou, Y. X. Zhu, Yanhong Xu, Yanping Huang, Yaohui Li, Yi Zheng, Yuchen Zhu, Yunxian Ma, Ying Tang, Yukun Zha, Yuting Yan, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhenda Xie, Zhengyan Zhang, Zhewen Hao, Zhicheng Ma, Zhigang Yan, Zhiyu Wu, Zihui Gu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Zizheng Pan, Zhen Huang, Zhipeng Xu, Zhongyu Zhang, Zhen Zhang•Jan 22, 2025•3194