GATE開放:一個全面的基準測試,用於評估開放式交錯的圖像-文本生成
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
November 27, 2024
作者: Pengfei Zhou, Xiaopeng Peng, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu, Tianhua Li, Yuxuan Xie, Xiaojun Chang, Yu Qiao, Wenqi Shao, Kaipeng Zhang
cs.AI
摘要
多模式大型語言模型(MLLMs)在視覺理解和生成任務方面取得了顯著進展。然而,生成交錯的圖像-文本內容仍然是一個挑戰,這需要整合的多模式理解和生成能力。儘管統一模型的進展提供了新的解決方案,但由於數據大小和多樣性的限制,現有的基準測試不足以評估這些方法。為了彌合這一差距,我們介紹了GATE OpenING(OpenING),這是一個包含5,400個高質量人工標註實例的全面基準測試,涵蓋了56個現實世界任務。OpenING涵蓋了各種日常情境,如旅遊指南、設計和腦力激盪,為具有挑戰性的交錯生成方法提供了一個強大的平台。此外,我們提出了IntJudge,一個用於評估開放式多模式生成方法的評判模型。通過使用一個新穎的數據管道進行訓練,我們的IntJudge與人類判斷達成了82.42%的一致率,比基於GPT的評估者高出11.34%。對OpenING的大量實驗顯示,當前的交錯生成方法仍有很大的改進空間。關於交錯的圖像-文本生成的關鍵發現進一步提供,以指導下一代模型的發展。OpenING的開源代碼位於https://opening.github.io。
English
Multimodal Large Language Models (MLLMs) have made significant strides in
visual understanding and generation tasks. However, generating interleaved
image-text content remains a challenge, which requires integrated multimodal
understanding and generation abilities. While the progress in unified models
offers new solutions, existing benchmarks are insufficient for evaluating these
methods due to data size and diversity limitations. To bridge this gap, we
introduce GATE OpenING (OpenING), a comprehensive benchmark comprising 5,400
high-quality human-annotated instances across 56 real-world tasks. OpenING
covers diverse daily scenarios such as travel guide, design, and brainstorming,
offering a robust platform for challenging interleaved generation methods. In
addition, we present IntJudge, a judge model for evaluating open-ended
multimodal generation methods. Trained with a novel data pipeline, our IntJudge
achieves an agreement rate of 82. 42% with human judgments, outperforming
GPT-based evaluators by 11.34%. Extensive experiments on OpenING reveal that
current interleaved generation methods still have substantial room for
improvement. Key findings on interleaved image-text generation are further
presented to guide the development of next-generation models. The OpenING is
open-sourced at https://opening.github.io.Summary
AI-Generated Summary