LAION-SG：一个增强型大规模数据集，用于训练带有结构化注释的复杂图像文本模型

摘要

最近在文本到图像（T2I）生成方面取得了显著进展，成功地从文本中生成高质量图像。然而，现有的T2I模型在涉及多个对象和复杂关系的组合图像生成中表现出性能下降。我们将这一问题归因于现有图像-文本配对数据集的局限，这些数据集缺乏精确的对象间关系注释，仅具备提示。为解决这一问题，我们构建了LAION-SG，一个大规模数据集，具有高质量的场景图（SG）结构注释，精确描述多个对象的属性和关系，有效地表征复杂场景中的语义结构。基于LAION-SG，我们训练了一个新的基础模型SDXL-SG，将结构注释信息纳入生成过程中。大量实验证明，在我们的LAION-SG上训练的先进模型在复杂场景生成方面显著提升，超过现有数据集上的模型。我们还引入了CompSG-Bench，一个基准测试，评估模型在组合图像生成上的表现，为该领域建立了新的标准。

English

Recent advances in text-to-image (T2I) generation have shown remarkable success in producing high-quality images from text. However, existing T2I models show decayed performance in compositional image generation involving multiple objects and intricate relationships. We attribute this problem to limitations in existing datasets of image-text pairs, which lack precise inter-object relationship annotations with prompts only. To address this problem, we construct LAION-SG, a large-scale dataset with high-quality structural annotations of scene graphs (SG), which precisely describe attributes and relationships of multiple objects, effectively representing the semantic structure in complex scenes. Based on LAION-SG, we train a new foundation model SDXL-SG to incorporate structural annotation information into the generation process. Extensive experiments show advanced models trained on our LAION-SG boast significant performance improvements in complex scene generation over models on existing datasets. We also introduce CompSG-Bench, a benchmark that evaluates models on compositional image generation, establishing a new standard for this domain.

LAION-SG：一个增强型大规模数据集，用于训练带有结构化注释的复杂图像文本模型

LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations

摘要

Support