ChatPaper.aiChatPaper

TextAtlas5M:用于密集文本图像生成的大规模数据集

TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation

February 11, 2025
作者: Alex Jinpeng Wang, Dongxing Mao, Jiawei Zhang, Weiming Han, Zhuobai Dong, Linjie Li, Yiqi Lin, Zhengyuan Yang, Libo Qin, Fuwei Zhang, Lijuan Wang, Min Li
cs.AI

摘要

最近,基于文本的图像生成引起了广泛关注,并且正在处理越来越长和全面的文本提示。在日常生活中,密集和复杂的文本出现在广告、信息图表和标识等环境中,其中文本和视觉的整合对传达复杂信息至关重要。然而,尽管取得了这些进展,生成包含长篇文本的图像仍然是一个持久性挑战,主要是由于现有数据集的限制,这些数据集通常侧重于较短和较简单的文本。为了填补这一空白,我们引入了TextAtlas5M,这是一个专门设计用于评估文本条件下图像生成中长文本渲染的新数据集。我们的数据集包含跨多种数据类型的500万个长文本生成和收集的图像,可以全面评估大规模生成模型在长文本图像生成上的表现。我们进一步策划了3000个人工改进的测试集TextAtlasEval,涵盖3个数据领域,建立了一个最为广泛的文本条件生成基准之一。评估表明,TextAtlasEval基准即使对于最先进的专有模型(如搭配DallE-3的GPT4o)也提出了重大挑战,而它们的开源对应模型表现出更大的性能差距。这些证据将TextAtlas5M定位为一个有价值的数据集,用于训练和评估未来一代文本条件图像生成模型。
English
Text-conditioned image generation has gained significant attention in recent years and are processing increasingly longer and comprehensive text prompt. In everyday life, dense and intricate text appears in contexts like advertisements, infographics, and signage, where the integration of both text and visuals is essential for conveying complex information. However, despite these advances, the generation of images containing long-form text remains a persistent challenge, largely due to the limitations of existing datasets, which often focus on shorter and simpler text. To address this gap, we introduce TextAtlas5M, a novel dataset specifically designed to evaluate long-text rendering in text-conditioned image generation. Our dataset consists of 5 million long-text generated and collected images across diverse data types, enabling comprehensive evaluation of large-scale generative models on long-text image generation. We further curate 3000 human-improved test set TextAtlasEval across 3 data domains, establishing one of the most extensive benchmarks for text-conditioned generation. Evaluations suggest that the TextAtlasEval benchmarks present significant challenges even for the most advanced proprietary models (e.g. GPT4o with DallE-3), while their open-source counterparts show an even larger performance gap. These evidences position TextAtlas5M as a valuable dataset for training and evaluating future-generation text-conditioned image generation models.

Summary

AI-Generated Summary

PDF432February 13, 2025