평가 에이전트: 시각 생성 모델을 위한 효율적이고 신속한 평가 프레임워크

초록

최근 시각적 생성 모델의 발전으로 고품질 이미지 및 비디오 생성이 가능해져 다양한 응용이 열렸습니다. 그러나 이러한 모델을 평가하는 것은 종종 수백 개 또는 수천 개의 이미지나 비디오를 샘플링해야 하므로, 특히 확산 기반 모델의 경우 내재적으로 샘플링 속도가 느려 계산 비용이 많이 듭니다. 더구나 기존의 평가 방법은 특정 사용자 요구를 간과하고 명확한 설명 없이 숫자 결과만 제공하는 엄격한 파이프라인에 의존합니다. 반면 사람들은 몇 가지 샘플만 관찰하여 모델의 능력에 대한 인상을 빠르게 형성할 수 있습니다. 이를 모방하기 위해 우리는 Evaluation Agent 프레임워크를 제안합니다. 이는 효율적이고 동적이며 다중 라운드 평가를 위해 소수의 샘플만 사용하는 동시에 상세하고 사용자 맞춤형 분석을 제공하는 인간과 유사한 전략을 채택합니다. 이는 1) 효율성, 2) 다양한 사용자 요구에 맞는 신속한 평가, 3) 단일 숫자 점수 이상의 설명 가능성, 그리고 4) 다양한 모델과 도구에 걸쳐 확장 가능성이라는 네 가지 주요 이점을 제공합니다. 실험 결과, Evaluation Agent는 전통적인 방법의 평가 시간을 10%로 줄이면서 비슷한 결과를 제공합니다. Evaluation Agent 프레임워크는 시각적 생성 모델 및 효율적인 평가에 대한 연구를 진전시키기 위해 완전히 오픈 소스로 제공됩니다.

English

Recent advancements in visual generative models have enabled high-quality image and video generation, opening diverse applications. However, evaluating these models often demands sampling hundreds or thousands of images or videos, making the process computationally expensive, especially for diffusion-based models with inherently slow sampling. Moreover, existing evaluation methods rely on rigid pipelines that overlook specific user needs and provide numerical results without clear explanations. In contrast, humans can quickly form impressions of a model's capabilities by observing only a few samples. To mimic this, we propose the Evaluation Agent framework, which employs human-like strategies for efficient, dynamic, multi-round evaluations using only a few samples per round, while offering detailed, user-tailored analyses. It offers four key advantages: 1) efficiency, 2) promptable evaluation tailored to diverse user needs, 3) explainability beyond single numerical scores, and 4) scalability across various models and tools. Experiments show that Evaluation Agent reduces evaluation time to 10% of traditional methods while delivering comparable results. The Evaluation Agent framework is fully open-sourced to advance research in visual generative models and their efficient evaluation.

평가 에이전트: 시각 생성 모델을 위한 효율적이고 신속한 평가 프레임워크

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models

초록

Support