评估代理：用于视觉生成模型的高效且可提示的评估框架

摘要

最近视觉生成模型的进展使得高质量图像和视频生成成为可能，开启了多样化的应用。然而，评估这些模型通常需要对数百甚至数千张图像或视频进行采样，使得这一过程在扩散型模型中尤为昂贵，因为这类模型的采样速度本身较慢。此外，现有的评估方法依赖于刻板的流程，忽视了特定用户需求，并提供缺乏清晰解释的数值结果。相比之下，人类可以通过观察少量样本迅速形成对模型能力的印象。为了模仿这一过程，我们提出了评估代理框架，采用类人策略进行高效、动态、多轮次评估，每轮只需少量样本，同时提供详细、用户定制的分析。它具有四个关键优势：1）高效性，2）可根据多样化用户需求进行即时评估，3）超越单一数值评分的可解释性，4）适用于各种模型和工具的可扩展性。实验证明，评估代理能将评估时间缩短至传统方法的10%，同时提供可比较的结果。评估代理框架已完全开源，以推动视觉生成模型及其高效评估的研究。

English

Recent advancements in visual generative models have enabled high-quality image and video generation, opening diverse applications. However, evaluating these models often demands sampling hundreds or thousands of images or videos, making the process computationally expensive, especially for diffusion-based models with inherently slow sampling. Moreover, existing evaluation methods rely on rigid pipelines that overlook specific user needs and provide numerical results without clear explanations. In contrast, humans can quickly form impressions of a model's capabilities by observing only a few samples. To mimic this, we propose the Evaluation Agent framework, which employs human-like strategies for efficient, dynamic, multi-round evaluations using only a few samples per round, while offering detailed, user-tailored analyses. It offers four key advantages: 1) efficiency, 2) promptable evaluation tailored to diverse user needs, 3) explainability beyond single numerical scores, and 4) scalability across various models and tools. Experiments show that Evaluation Agent reduces evaluation time to 10% of traditional methods while delivering comparable results. The Evaluation Agent framework is fully open-sourced to advance research in visual generative models and their efficient evaluation.

评估代理：用于视觉生成模型的高效且可提示的评估框架

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models

摘要

Support