ChatPaper.aiChatPaper

在你眼中,我是否形似“猫.n.01”?一项关于分类学图像生成的基准测试

Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark

March 13, 2025
作者: Viktor Moskvoretskii, Alina Lobanova, Ekaterina Neminova, Chris Biemann, Alexander Panchenko, Irina Nikishina
cs.AI

摘要

本文探讨了在零样本设置下使用文本到图像模型为分类学概念生成图像的可行性。尽管基于文本的分类学扩展方法已相当成熟,但视觉维度的潜力仍未被充分挖掘。为此,我们提出了一个全面的分类学图像生成基准,旨在评估模型理解分类学概念并生成相关、高质量图像的能力。该基准涵盖了常识性及随机采样的WordNet概念,以及大语言模型生成的预测结果。我们采用9种新颖的与分类学相关的文本到图像指标及人类反馈,对12个模型进行了评估。此外,我们率先将GPT-4反馈用于图像生成的成对评估中。实验结果显示,模型在此任务中的排名与标准文本到图像任务存在显著差异,Playground-v2和FLUX在各项指标及子集上持续表现优异,而基于检索的方法则表现不佳。这些发现凸显了自动化结构化数据资源整理的巨大潜力。
English
This paper explores the feasibility of using text-to-image models in a zero-shot setup to generate images for taxonomy concepts. While text-based methods for taxonomy enrichment are well-established, the potential of the visual dimension remains unexplored. To address this, we propose a comprehensive benchmark for Taxonomy Image Generation that assesses models' abilities to understand taxonomy concepts and generate relevant, high-quality images. The benchmark includes common-sense and randomly sampled WordNet concepts, alongside the LLM generated predictions. The 12 models are evaluated using 9 novel taxonomy-related text-to-image metrics and human feedback. Moreover, we pioneer the use of pairwise evaluation with GPT-4 feedback for image generation. Experimental results show that the ranking of models differs significantly from standard T2I tasks. Playground-v2 and FLUX consistently outperform across metrics and subsets and the retrieval-based approach performs poorly. These findings highlight the potential for automating the curation of structured data resources.

Summary

AI-Generated Summary

PDF81March 14, 2025