ChatPaper.aiChatPaper

DOEI:面向注意力增强类激活图的双重嵌入信息优化

DOEI: Dual Optimization of Embedding Information for Attention-Enhanced Class Activation Maps

February 21, 2025
作者: Hongjie Zhu, Zeyu Zhang, Guansong Pang, Xu Wang, Shimin Wen, Yu Bai, Daji Ergu, Ying Cai, Yang Zhao
cs.AI

摘要

弱监督语义分割(WSSS)通常利用有限的语义标注来获取初始的类激活图(CAM)。然而,由于高维空间中类激活响应与语义信息之间的耦合不足,CAM容易出现对象共现或激活不足的问题,导致识别精度较低。为解决这一问题,我们提出了DOEI(双重优化嵌入信息),这是一种通过语义感知注意力权重矩阵重构嵌入表示的新方法,以优化嵌入信息的表达能力。具体而言,DOEI在类别到局部区域的交互过程中,放大高置信度的标记并抑制低置信度的标记。这种激活响应与语义信息的对齐增强了目标特征的传播与解耦,使得生成的嵌入能在高级语义空间中更准确地表示目标特征。此外,我们在DOEI中提出了一种混合特征对齐模块,该模块结合了RGB值、嵌入引导特征和自注意力权重,以提高候选标记的可靠性。全面的实验表明,DOEI是一个有效的即插即用模块,它赋能了基于视觉Transformer的最先进WSSS模型,在包括PASCAL VOC(+3.6%、+1.5%、+1.2% mIoU)和MS COCO(+1.2%、+1.6% mIoU)在内的流行基准上显著提升了CAM质量和分割性能。代码将发布于https://github.com/AIGeeksGroup/DOEI。
English
Weakly supervised semantic segmentation (WSSS) typically utilizes limited semantic annotations to obtain initial Class Activation Maps (CAMs). However, due to the inadequate coupling between class activation responses and semantic information in high-dimensional space, the CAM is prone to object co-occurrence or under-activation, resulting in inferior recognition accuracy. To tackle this issue, we propose DOEI, Dual Optimization of Embedding Information, a novel approach that reconstructs embedding representations through semantic-aware attention weight matrices to optimize the expression capability of embedding information. Specifically, DOEI amplifies tokens with high confidence and suppresses those with low confidence during the class-to-patch interaction. This alignment of activation responses with semantic information strengthens the propagation and decoupling of target features, enabling the generated embeddings to more accurately represent target features in high-level semantic space. In addition, we propose a hybrid-feature alignment module in DOEI that combines RGB values, embedding-guided features, and self-attention weights to increase the reliability of candidate tokens. Comprehensive experiments show that DOEI is an effective plug-and-play module that empowers state-of-the-art visual transformer-based WSSS models to significantly improve the quality of CAMs and segmentation performance on popular benchmarks, including PASCAL VOC (+3.6%, +1.5%, +1.2% mIoU) and MS COCO (+1.2%, +1.6% mIoU). Code will be available at https://github.com/AIGeeksGroup/DOEI.

Summary

AI-Generated Summary

PDF22February 27, 2025