LiveVQA:实时视觉知识探索
LiveVQA: Live Visual Knowledge Seeking
April 7, 2025
作者: Mingyang Fu, Yuyang Peng, Benlin Liu, Yao Wan, Dongping Chen
cs.AI
摘要
我们推出了LiveVQA,这是一个自动从互联网收集最新视觉知识并合成视觉问答(VQA)问题的数据集。LiveVQA包含来自6个新闻网站、涵盖14个新闻类别的3,602个单跳和多跳视觉问题,具有高质量的图文一致性和真实信息。我们对15种多模态大语言模型(如GPT-4o、Gemma-3和Qwen-2.5-VL系列)的评估表明,更强的模型整体表现更优,其中先进的视觉推理能力在处理复杂的多跳问题时尤为关键。尽管在文本问题上表现出色,但配备搜索引擎等工具的模型在回答需要最新视觉知识的视觉问题时仍存在显著差距,这为未来研究指明了重要方向。
English
We introduce LiveVQA, an automatically collected dataset of latest visual
knowledge from the Internet with synthesized VQA problems. LiveVQA consists of
3,602 single- and multi-hop visual questions from 6 news websites across 14
news categories, featuring high-quality image-text coherence and authentic
information. Our evaluation across 15 MLLMs (e.g., GPT-4o, Gemma-3, and
Qwen-2.5-VL family) demonstrates that stronger models perform better overall,
with advanced visual reasoning capabilities proving crucial for complex
multi-hop questions. Despite excellent performance on textual problems, models
with tools like search engines still show significant gaps when addressing
visual questions requiring latest visual knowledge, highlighting important
areas for future research.Summary
AI-Generated Summary