ChatPaper.aiChatPaper

LiveVQA:即時視覺知識探索

LiveVQA: Live Visual Knowledge Seeking

April 7, 2025
作者: Mingyang Fu, Yuyang Peng, Benlin Liu, Yao Wan, Dongping Chen
cs.AI

摘要

我們推出了LiveVQA,這是一個自動從網際網路收集最新視覺知識並合成視覺問答問題的數據集。LiveVQA包含來自6個新聞網站的3,602個單跳和多跳視覺問題,涵蓋14個新聞類別,具有高質量的圖文一致性和真實信息。我們對15種多模態大語言模型(如GPT-4o、Gemma-3和Qwen-2.5-VL系列)的評估顯示,更強大的模型整體表現更佳,其中先進的視覺推理能力對於處理複雜的多跳問題至關重要。儘管這些模型在文本問題上表現出色,但配備搜索引擎等工具的模型在處理需要最新視覺知識的視覺問題時仍存在顯著差距,這凸顯了未來研究的重要方向。
English
We introduce LiveVQA, an automatically collected dataset of latest visual knowledge from the Internet with synthesized VQA problems. LiveVQA consists of 3,602 single- and multi-hop visual questions from 6 news websites across 14 news categories, featuring high-quality image-text coherence and authentic information. Our evaluation across 15 MLLMs (e.g., GPT-4o, Gemma-3, and Qwen-2.5-VL family) demonstrates that stronger models perform better overall, with advanced visual reasoning capabilities proving crucial for complex multi-hop questions. Despite excellent performance on textual problems, models with tools like search engines still show significant gaps when addressing visual questions requiring latest visual knowledge, highlighting important areas for future research.

Summary

AI-Generated Summary

PDF134April 8, 2025