統一代理框架下的條件式圖像生成評估
A Unified Agentic Framework for Evaluating Conditional Image Generation
April 9, 2025
作者: Jifang Wang, Xue Yang, Longyue Wang, Zhenran Xu, Yiyu Wang, Yaowei Wang, Weihua Luo, Kaifu Zhang, Baotian Hu, Min Zhang
cs.AI
摘要
條件式圖像生成因其能夠個性化內容而受到廣泛關注。然而,該領域在開發任務無關、可靠且可解釋的評估指標方面面臨挑戰。本文介紹了CIGEval,這是一個用於全面評估條件式圖像生成任務的統一代理框架。CIGEval以大型多模態模型(LMMs)為核心,整合了多功能工具箱,並建立了一個細粒度的評估框架。此外,我們合成了用於微調的評估軌跡,使較小的LMMs能夠自主選擇合適的工具,並基於工具輸出進行細緻的分析。在七個主要條件式圖像生成任務上的實驗表明,CIGEval(GPT-4o版本)與人類評估的相關性高達0.4625,與評分者間的相關性0.47非常接近。此外,當使用僅2.3K訓練軌跡的7B開源LMMs實現時,CIGEval超越了之前基於GPT-4o的最先進方法。針對GPT-4o圖像生成的案例研究突顯了CIGEval在識別主體一致性和控制指導遵循方面的細微問題的能力,顯示其在自動化圖像生成任務評估中具有與人類可靠性相當的巨大潛力。
English
Conditional image generation has gained significant attention for its ability
to personalize content. However, the field faces challenges in developing
task-agnostic, reliable, and explainable evaluation metrics. This paper
introduces CIGEval, a unified agentic framework for comprehensive evaluation of
conditional image generation tasks. CIGEval utilizes large multimodal models
(LMMs) as its core, integrating a multi-functional toolbox and establishing a
fine-grained evaluation framework. Additionally, we synthesize evaluation
trajectories for fine-tuning, empowering smaller LMMs to autonomously select
appropriate tools and conduct nuanced analyses based on tool outputs.
Experiments across seven prominent conditional image generation tasks
demonstrate that CIGEval (GPT-4o version) achieves a high correlation of 0.4625
with human assessments, closely matching the inter-annotator correlation of
0.47. Moreover, when implemented with 7B open-source LMMs using only 2.3K
training trajectories, CIGEval surpasses the previous GPT-4o-based
state-of-the-art method. Case studies on GPT-4o image generation highlight
CIGEval's capability in identifying subtle issues related to subject
consistency and adherence to control guidance, indicating its great potential
for automating evaluation of image generation tasks with human-level
reliability.Summary
AI-Generated Summary