SEAGULL：基于视觉语言指导调整的感兴趣区域无参考图像质量评估

摘要

现有的图像质量评估（IQA）方法在分析整体图像质量方面取得了显著成功，但很少有研究探讨感兴趣区域（ROIs）的质量分析。ROIs的质量分析可以为图像质量改进提供细致的指导，在侧重于区域级质量的场景中至关重要。本文提出了一种新颖的网络，SEAGULL，它可以利用大型视觉-语言模型的指导来查看和评估ROIs的质量。SEAGULL整合了一个视觉-语言模型（VLM）、由Segment Anything Model（SAM）生成的掩模以指定ROIs，以及一个精心设计的基于掩模的特征提取器（MFE）来提取指定ROIs的全局和局部标记，实现对ROIs的准确细粒度IQA。此外，本文构建了两个基于ROI的IQA数据集，SEAGULL-100w和SEAGULL-3k，用于训练和评估基于ROI的IQA。SEAGULL-100w包括约100w个合成失真图像，其中包含3300万个ROIs，用于预训练以提高模型对区域质量感知的能力，而SEAGULL-3k包含约3k个真实失真ROIs，以增强模型感知真实世界失真的能力。在SEAGULL-100w上进行预训练并在SEAGULL-3k上进行微调后，SEAGULL在细粒度ROI质量评估方面表现出色。代码和数据集可在https://github.com/chencn2020/Seagull 上公开获取。

English

Existing Image Quality Assessment (IQA) methods achieve remarkable success in analyzing quality for overall image, but few works explore quality analysis for Regions of Interest (ROIs). The quality analysis of ROIs can provide fine-grained guidance for image quality improvement and is crucial for scenarios focusing on region-level quality. This paper proposes a novel network, SEAGULL, which can SEe and Assess ROIs quality with GUidance from a Large vision-Language model. SEAGULL incorporates a vision-language model (VLM), masks generated by Segment Anything Model (SAM) to specify ROIs, and a meticulously designed Mask-based Feature Extractor (MFE) to extract global and local tokens for specified ROIs, enabling accurate fine-grained IQA for ROIs. Moreover, this paper constructs two ROI-based IQA datasets, SEAGULL-100w and SEAGULL-3k, for training and evaluating ROI-based IQA. SEAGULL-100w comprises about 100w synthetic distortion images with 33 million ROIs for pre-training to improve the model's ability of regional quality perception, and SEAGULL-3k contains about 3k authentic distortion ROIs to enhance the model's ability to perceive real world distortions. After pre-training on SEAGULL-100w and fine-tuning on SEAGULL-3k, SEAGULL shows remarkable performance on fine-grained ROI quality assessment. Code and datasets are publicly available at the https://github.com/chencn2020/Seagull.

SEAGULL：基于视觉语言指导调整的感兴趣区域无参考图像质量评估

SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning

摘要

Summary

Support