ChatPaper.aiChatPaper

IMAGINE-E:最先进的文本到图像模型的图像生成智能评估

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models

January 23, 2025
作者: Jiayi Lei, Renrui Zhang, Xiangfei Hu, Weifeng Lin, Zhen Li, Wenjian Sun, Ruoyi Du, Le Zhuo, Zhongyu Li, Xinyue Li, Shitian Zhao, Ziyu Guo, Yiting Lu, Peng Gao, Hongsheng Li
cs.AI

摘要

随着扩散模型的快速发展,文本到图像(T2I)模型取得了显著进展,在即时跟随和图像生成方面展现出令人印象深刻的能力。最近推出的模型,如FLUX.1和Ideogram2.0,以及其他模型如Dall-E3和Stable Diffusion 3,在各种复杂任务中表现出卓越性能,引发了关于T2I模型是否朝着通用适用性发展的疑问。除了传统的图像生成,这些模型展现出跨越多个领域的能力,包括可控生成、图像编辑、视频、音频、3D和动态生成,以及计算机视觉任务,如语义分割和深度估计。然而,当前的评估框架不足以全面评估这些模型在不断扩展的领域中的性能。为了全面评估这些模型,我们开发了IMAGINE-E,并测试了六个知名模型:FLUX.1、Ideogram2.0、Midjourney、Dall-E3、Stable Diffusion 3和Jimeng。我们的评估分为五个关键领域:结构化输出生成、逼真度和物理一致性、特定领域生成、具有挑战性的场景生成以及多样式创建任务。这一全面评估突出了每个模型的优势和局限,特别是FLUX.1和Ideogram2.0在结构化和特定领域任务中的出色表现,强调了T2I模型作为基础AI工具的应用扩展和潜力。这项研究为T2I模型作为通用工具的现状和未来发展轨迹提供了宝贵的见解。评估脚本将在https://github.com/jylei16/Imagine-e发布。
English
With the rapid development of diffusion models, text-to-image(T2I) models have made significant progress, showcasing impressive abilities in prompt following and image generation. Recently launched models such as FLUX.1 and Ideogram2.0, along with others like Dall-E3 and Stable Diffusion 3, have demonstrated exceptional performance across various complex tasks, raising questions about whether T2I models are moving towards general-purpose applicability. Beyond traditional image generation, these models exhibit capabilities across a range of fields, including controllable generation, image editing, video, audio, 3D, and motion generation, as well as computer vision tasks like semantic segmentation and depth estimation. However, current evaluation frameworks are insufficient to comprehensively assess these models' performance across expanding domains. To thoroughly evaluate these models, we developed the IMAGINE-E and tested six prominent models: FLUX.1, Ideogram2.0, Midjourney, Dall-E3, Stable Diffusion 3, and Jimeng. Our evaluation is divided into five key domains: structured output generation, realism, and physical consistency, specific domain generation, challenging scenario generation, and multi-style creation tasks. This comprehensive assessment highlights each model's strengths and limitations, particularly the outstanding performance of FLUX.1 and Ideogram2.0 in structured and specific domain tasks, underscoring the expanding applications and potential of T2I models as foundational AI tools. This study provides valuable insights into the current state and future trajectory of T2I models as they evolve towards general-purpose usability. Evaluation scripts will be released at https://github.com/jylei16/Imagine-e.

Summary

AI-Generated Summary

PDF172January 24, 2025