M3-AGIQA:多模态、多轮次、多维度的人工智能生成图像质量评估
M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment
February 21, 2025
作者: Chuan Cui, Kejiang Chen, Zhihua Wei, Wen Shen, Weiming Zhang, Nenghai Yu
cs.AI
摘要
人工智能生成图像(AGI)模型的快速发展,在评估其质量方面引入了重大挑战,这需要从感知质量、提示对应性和真实性等多个维度进行考量。为应对这些挑战,我们提出了M3-AGIQA,一个多模态、多轮次、多方面的AGI质量评估综合框架。我们的方法利用多模态大语言模型(MLLMs)作为联合文本与图像编码器,并通过低秩适应(LoRA)微调将在线MLLMs的高级描述能力蒸馏至本地模型。该框架包含一个结构化的多轮评估机制,其中生成中间图像描述以深入洞察质量、对应性和真实性等方面。为使预测与人类感知判断一致,我们引入了一个由xLSTM和回归头构建的预测器,用于处理序列逻辑值并预测平均意见得分(MOSs)。在多个基准数据集上的广泛实验表明,M3-AGIQA实现了最先进的性能,有效捕捉了AGI质量的细微差别。此外,跨数据集验证证实了其强大的泛化能力。代码可在https://github.com/strawhatboy/M3-AGIQA获取。
English
The rapid advancement of AI-generated image (AGI) models has introduced
significant challenges in evaluating their quality, which requires considering
multiple dimensions such as perceptual quality, prompt correspondence, and
authenticity. To address these challenges, we propose M3-AGIQA, a comprehensive
framework for AGI quality assessment that is Multimodal, Multi-Round, and
Multi-Aspect. Our approach leverages the capabilities of Multimodal Large
Language Models (MLLMs) as joint text and image encoders and distills advanced
captioning capabilities from online MLLMs into a local model via Low-Rank
Adaptation (LoRA) fine-tuning. The framework includes a structured multi-round
evaluation mechanism, where intermediate image descriptions are generated to
provide deeper insights into the quality, correspondence, and authenticity
aspects. To align predictions with human perceptual judgments, a predictor
constructed by an xLSTM and a regression head is incorporated to process
sequential logits and predict Mean Opinion Scores (MOSs). Extensive experiments
conducted on multiple benchmark datasets demonstrate that M3-AGIQA achieves
state-of-the-art performance, effectively capturing nuanced aspects of AGI
quality. Furthermore, cross-dataset validation confirms its strong
generalizability. The code is available at
https://github.com/strawhatboy/M3-AGIQA.Summary
AI-Generated Summary