PosterSum:面向科研海报摘要的多模态基准数据集
PosterSum: A Multimodal Benchmark for Scientific Poster Summarization
February 24, 2025
作者: Rohit Saxena, Pasquale Minervini, Frank Keller
cs.AI
摘要
从多模态文档中生成准确且简洁的文本摘要具有挑战性,尤其是在处理如科学海报等视觉复杂度高的内容时。我们推出了PosterSum,一个旨在推动视觉语言模型发展的新颖基准,这些模型能够理解科学海报并将其总结为研究论文摘要。我们的数据集包含16,305张会议海报及其对应的摘要作为总结。每张海报以图像格式提供,并呈现了多样化的视觉理解挑战,如复杂布局、密集文本区域、表格和图表。我们在PosterSum上对当前最先进的多模态大语言模型(MLLMs)进行了基准测试,结果表明这些模型在准确解读和总结科学海报方面存在困难。我们提出了“分段与总结”这一分层方法,在自动化指标上超越了现有MLLMs,实现了ROUGE-L分数3.14%的提升。这将成为未来海报摘要研究的一个起点。
English
Generating accurate and concise textual summaries from multimodal documents
is challenging, especially when dealing with visually complex content like
scientific posters. We introduce PosterSum, a novel benchmark to advance the
development of vision-language models that can understand and summarize
scientific posters into research paper abstracts. Our dataset contains 16,305
conference posters paired with their corresponding abstracts as summaries. Each
poster is provided in image format and presents diverse visual understanding
challenges, such as complex layouts, dense text regions, tables, and figures.
We benchmark state-of-the-art Multimodal Large Language Models (MLLMs) on
PosterSum and demonstrate that they struggle to accurately interpret and
summarize scientific posters. We propose Segment & Summarize, a hierarchical
method that outperforms current MLLMs on automated metrics, achieving a 3.14%
gain in ROUGE-L. This will serve as a starting point for future research on
poster summarization.Summary
AI-Generated Summary