ChatPaper.aiChatPaper

统一代理框架下的条件图像生成评估

A Unified Agentic Framework for Evaluating Conditional Image Generation

April 9, 2025
作者: Jifang Wang, Xue Yang, Longyue Wang, Zhenran Xu, Yiyu Wang, Yaowei Wang, Weihua Luo, Kaifu Zhang, Baotian Hu, Min Zhang
cs.AI

摘要

条件图像生成因其个性化内容的能力而备受关注。然而,该领域在开发任务无关、可靠且可解释的评估指标方面面临挑战。本文提出了CIGEval,一个用于全面评估条件图像生成任务的统一代理框架。CIGEval以大型多模态模型(LMMs)为核心,集成了多功能工具箱,并建立了细粒度的评估框架。此外,我们合成了用于微调的评估轨迹,使较小的LMMs能够自主选择适当的工具,并根据工具输出进行细致分析。在七项主要条件图像生成任务上的实验表明,CIGEval(GPT-4o版本)与人类评估的相关系数高达0.4625,与标注者间相关系数0.47非常接近。此外,当使用仅2.3K训练轨迹的7B开源LMMs实现时,CIGEval超越了之前基于GPT-4o的最先进方法。针对GPT-4o图像生成的案例研究突出了CIGEval在识别主体一致性和控制指导遵循性等细微问题方面的能力,表明其在自动化评估图像生成任务方面具有人类级可靠性的巨大潜力。
English
Conditional image generation has gained significant attention for its ability to personalize content. However, the field faces challenges in developing task-agnostic, reliable, and explainable evaluation metrics. This paper introduces CIGEval, a unified agentic framework for comprehensive evaluation of conditional image generation tasks. CIGEval utilizes large multimodal models (LMMs) as its core, integrating a multi-functional toolbox and establishing a fine-grained evaluation framework. Additionally, we synthesize evaluation trajectories for fine-tuning, empowering smaller LMMs to autonomously select appropriate tools and conduct nuanced analyses based on tool outputs. Experiments across seven prominent conditional image generation tasks demonstrate that CIGEval (GPT-4o version) achieves a high correlation of 0.4625 with human assessments, closely matching the inter-annotator correlation of 0.47. Moreover, when implemented with 7B open-source LMMs using only 2.3K training trajectories, CIGEval surpasses the previous GPT-4o-based state-of-the-art method. Case studies on GPT-4o image generation highlight CIGEval's capability in identifying subtle issues related to subject consistency and adherence to control guidance, indicating its great potential for automating evaluation of image generation tasks with human-level reliability.

Summary

AI-Generated Summary

PDF302April 10, 2025