MaskRIS:用於指涉圖像分割的語義失真感知數據增強
MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation
November 28, 2024
作者: Minhyun Lee, Seungho Lee, Song Park, Dongyoon Han, Byeongho Heo, Hyunjung Shim
cs.AI
摘要
參考圖像分割(RIS)是一項先進的視覺語言任務,涉及根據自由形式文本描述識別和分割圖像中的物體。雖然先前的研究著重於對齊視覺和語言特徵,但探索訓練技術,如數據擴增,仍未被充分探討。在這項工作中,我們探索了對RIS有效的數據擴增,並提出了一個名為Masked Referring Image Segmentation(MaskRIS)的新型訓練框架。我們觀察到傳統的圖像增強對RIS效果不佳,導致性能下降,而簡單的隨機遮罩明顯提升了RIS的性能。MaskRIS使用圖像和文本遮罩,接著採用Distortion-aware Contextual Learning(DCL)以充分利用遮罩策略的好處。這種方法可以提高模型對遮擋、不完整信息和各種語言複雜性的韌性,從而顯著提升性能。實驗表明,MaskRIS可以輕鬆應用於各種RIS模型,並在完全監督和弱監督設置中優於現有方法。最後,MaskRIS在RefCOCO、RefCOCO+和RefCOCOg數據集上實現了新的最先進性能。代碼可在https://github.com/naver-ai/maskris找到。
English
Referring Image Segmentation (RIS) is an advanced vision-language task that
involves identifying and segmenting objects within an image as described by
free-form text descriptions. While previous studies focused on aligning visual
and language features, exploring training techniques, such as data
augmentation, remains underexplored. In this work, we explore effective data
augmentation for RIS and propose a novel training framework called Masked
Referring Image Segmentation (MaskRIS). We observe that the conventional image
augmentations fall short of RIS, leading to performance degradation, while
simple random masking significantly enhances the performance of RIS. MaskRIS
uses both image and text masking, followed by Distortion-aware Contextual
Learning (DCL) to fully exploit the benefits of the masking strategy. This
approach can improve the model's robustness to occlusions, incomplete
information, and various linguistic complexities, resulting in a significant
performance improvement. Experiments demonstrate that MaskRIS can easily be
applied to various RIS models, outperforming existing methods in both fully
supervised and weakly supervised settings. Finally, MaskRIS achieves new
state-of-the-art performance on RefCOCO, RefCOCO+, and RefCOCOg datasets. Code
is available at https://github.com/naver-ai/maskris.Summary
AI-Generated Summary