SEAGULL:基於視覺語言指導微調的感興趣區域無參考圖像質量評估
SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning
November 15, 2024
作者: Zewen Chen, Juan Wang, Wen Wang, Sunhan Xu, Hang Xiong, Yun Zeng, Jian Guo, Shuxun Wang, Chunfeng Yuan, Bing Li, Weiming Hu
cs.AI
摘要
現有的影像品質評估(IQA)方法在分析整體影像品質方面取得了顯著的成功,但很少有研究探討對於感興趣區域(ROIs)的品質分析。對 ROIs 的品質分析可以為影像品質改進提供細緻的指導,對於專注於區域級別品質的情境至關重要。本文提出了一種新型網絡,名為 SEAGULL,它可以利用大型視覺語言模型的指導來查看和評估 ROIs 的品質。SEAGULL 結合了視覺語言模型(VLM)、由 Segment Anything Model(SAM)生成的遮罩來指定 ROIs,以及精心設計的基於遮罩的特徵提取器(MFE)來提取指定 ROIs 的全局和局部標記,實現對 ROIs 的準確細粒度 IQA。此外,本文構建了兩個基於 ROI 的 IQA 數據集,SEAGULL-100w 和 SEAGULL-3k,用於訓練和評估基於 ROI 的 IQA。SEAGULL-100w 包含約 100w 張合成失真影像,擁有 3300 萬個 ROIs,用於預訓練以提升模型對區域品質感知的能力,而 SEAGULL-3k 包含約 3k 個真實失真 ROIs,以增強模型對真實世界失真的感知能力。在 SEAGULL-100w 預訓練並在 SEAGULL-3k 上進行微調後,SEAGULL 在細粒度 ROI 品質評估上表現出色。代碼和數據集可在 https://github.com/chencn2020/Seagull 公開獲取。
English
Existing Image Quality Assessment (IQA) methods achieve remarkable success in
analyzing quality for overall image, but few works explore quality analysis for
Regions of Interest (ROIs). The quality analysis of ROIs can provide
fine-grained guidance for image quality improvement and is crucial for
scenarios focusing on region-level quality. This paper proposes a novel
network, SEAGULL, which can SEe and Assess ROIs quality with GUidance from a
Large vision-Language model. SEAGULL incorporates a vision-language model
(VLM), masks generated by Segment Anything Model (SAM) to specify ROIs, and a
meticulously designed Mask-based Feature Extractor (MFE) to extract global and
local tokens for specified ROIs, enabling accurate fine-grained IQA for ROIs.
Moreover, this paper constructs two ROI-based IQA datasets, SEAGULL-100w and
SEAGULL-3k, for training and evaluating ROI-based IQA. SEAGULL-100w comprises
about 100w synthetic distortion images with 33 million ROIs for pre-training to
improve the model's ability of regional quality perception, and SEAGULL-3k
contains about 3k authentic distortion ROIs to enhance the model's ability to
perceive real world distortions. After pre-training on SEAGULL-100w and
fine-tuning on SEAGULL-3k, SEAGULL shows remarkable performance on fine-grained
ROI quality assessment. Code and datasets are publicly available at the
https://github.com/chencn2020/Seagull.Summary
AI-Generated Summary