LAION-SG：一個增強的大規模數據集，用於訓練具有結構標註的複雜圖像-文本模型。

摘要

最近在文本到圖像（T2I）生成方面取得了顯著進展，成功地從文本生成高質量圖像。然而，現有的T2I模型在涉及多個物體和複雜關係的合成圖像生成方面表現出下降的性能。我們將這個問題歸因於現有圖像-文本配對數據集的局限性，缺乏對物體間精確關係的標註，僅提供提示。為了解決這個問題，我們構建了LAION-SG，這是一個大規模數據集，具有高質量的場景圖（SG）結構標註，精確描述多個物體的屬性和關係，有效地表示複雜場景中的語義結構。基於LAION-SG，我們訓練了一個新的基礎模型SDXL-SG，將結構標註信息融入生成過程中。大量實驗表明，在我們的LAION-SG上訓練的先進模型在複雜場景生成方面比現有數據集上的模型表現出顯著的性能改善。我們還引入了CompSG-Bench，這是一個評估模型在合成圖像生成方面的基準，為該領域建立了一個新的標準。

English

Recent advances in text-to-image (T2I) generation have shown remarkable success in producing high-quality images from text. However, existing T2I models show decayed performance in compositional image generation involving multiple objects and intricate relationships. We attribute this problem to limitations in existing datasets of image-text pairs, which lack precise inter-object relationship annotations with prompts only. To address this problem, we construct LAION-SG, a large-scale dataset with high-quality structural annotations of scene graphs (SG), which precisely describe attributes and relationships of multiple objects, effectively representing the semantic structure in complex scenes. Based on LAION-SG, we train a new foundation model SDXL-SG to incorporate structural annotation information into the generation process. Extensive experiments show advanced models trained on our LAION-SG boast significant performance improvements in complex scene generation over models on existing datasets. We also introduce CompSG-Bench, a benchmark that evaluates models on compositional image generation, establishing a new standard for this domain.

LAION-SG：一個增強的大規模數據集，用於訓練具有結構標註的複雜圖像-文本模型。

LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations

摘要

Summary

Support