LAION-SG:一個增強的大規模數據集,用於訓練具有結構標註的複雜圖像-文本模型。
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations
December 11, 2024
作者: Zejian Li, Chenye Meng, Yize Li, Ling Yang, Shengyuan Zhang, Jiarui Ma, Jiayi Li, Guang Yang, Changyuan Yang, Zhiyuan Yang, Jinxiong Chang, Lingyun Sun
cs.AI
摘要
最近在文本到圖像(T2I)生成方面取得了顯著進展,成功地從文本生成高質量圖像。然而,現有的T2I模型在涉及多個物體和複雜關係的合成圖像生成方面表現出下降的性能。我們將這個問題歸因於現有圖像-文本配對數據集的局限性,缺乏對物體間精確關係的標註,僅提供提示。為了解決這個問題,我們構建了LAION-SG,這是一個大規模數據集,具有高質量的場景圖(SG)結構標註,精確描述多個物體的屬性和關係,有效地表示複雜場景中的語義結構。基於LAION-SG,我們訓練了一個新的基礎模型SDXL-SG,將結構標註信息融入生成過程中。大量實驗表明,在我們的LAION-SG上訓練的先進模型在複雜場景生成方面比現有數據集上的模型表現出顯著的性能改善。我們還引入了CompSG-Bench,這是一個評估模型在合成圖像生成方面的基準,為該領域建立了一個新的標準。
English
Recent advances in text-to-image (T2I) generation have shown remarkable
success in producing high-quality images from text. However, existing T2I
models show decayed performance in compositional image generation involving
multiple objects and intricate relationships. We attribute this problem to
limitations in existing datasets of image-text pairs, which lack precise
inter-object relationship annotations with prompts only. To address this
problem, we construct LAION-SG, a large-scale dataset with high-quality
structural annotations of scene graphs (SG), which precisely describe
attributes and relationships of multiple objects, effectively representing the
semantic structure in complex scenes. Based on LAION-SG, we train a new
foundation model SDXL-SG to incorporate structural annotation information into
the generation process. Extensive experiments show advanced models trained on
our LAION-SG boast significant performance improvements in complex scene
generation over models on existing datasets. We also introduce CompSG-Bench, a
benchmark that evaluates models on compositional image generation, establishing
a new standard for this domain.Summary
AI-Generated Summary