GATE开放:一个全面的基准,用于评判开放式交织的图像文本生成
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
November 27, 2024
作者: Pengfei Zhou, Xiaopeng Peng, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu, Tianhua Li, Yuxuan Xie, Xiaojun Chang, Yu Qiao, Wenqi Shao, Kaipeng Zhang
cs.AI
摘要
多模态大型语言模型(MLLMs)在视觉理解和生成任务中取得了重大进展。然而,生成交错的图像-文本内容仍然是一个挑战,这需要综合的多模态理解和生成能力。虽然统一模型的进展提供了新的解决方案,但由于数据规模和多样性的限制,现有的基准测试不足以评估这些方法。为了弥合这一差距,我们引入了GATE OpenING(OpenING),这是一个包含5,400个高质量人工注释实例的全面基准,涵盖了56个现实世界任务。OpenING涵盖了旅行指南、设计和头脑风暴等多样化的日常场景,为具有挑战性的交错生成方法提供了一个强大的平台。此外,我们提出了IntJudge,一个用于评估开放式多模态生成方法的评判模型。通过采用新颖的数据管道进行训练,我们的IntJudge与人类判断达成82.42%的一致率,优于基于GPT的评估者11.34%。对OpenING的大量实验显示,当前的交错生成方法仍有很大的改进空间。关于交错的图像-文本生成的关键发现进一步呈现,以指导下一代模型的发展。OpenING在https://opening.github.io 上开源。
English
Multimodal Large Language Models (MLLMs) have made significant strides in
visual understanding and generation tasks. However, generating interleaved
image-text content remains a challenge, which requires integrated multimodal
understanding and generation abilities. While the progress in unified models
offers new solutions, existing benchmarks are insufficient for evaluating these
methods due to data size and diversity limitations. To bridge this gap, we
introduce GATE OpenING (OpenING), a comprehensive benchmark comprising 5,400
high-quality human-annotated instances across 56 real-world tasks. OpenING
covers diverse daily scenarios such as travel guide, design, and brainstorming,
offering a robust platform for challenging interleaved generation methods. In
addition, we present IntJudge, a judge model for evaluating open-ended
multimodal generation methods. Trained with a novel data pipeline, our IntJudge
achieves an agreement rate of 82. 42% with human judgments, outperforming
GPT-based evaluators by 11.34%. Extensive experiments on OpenING reveal that
current interleaved generation methods still have substantial room for
improvement. Key findings on interleaved image-text generation are further
presented to guide the development of next-generation models. The OpenING is
open-sourced at https://opening.github.io.Summary
AI-Generated Summary