ChatPaper.aiChatPaper

ChartQAPro:一個更為多元且具挑戰性的圖表問答基準測試

ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering

April 7, 2025
作者: Ahmed Masry, Mohammed Saidul Islam, Mahir Ahmed, Aayush Bajaj, Firoz Kabir, Aaryaman Kartha, Md Tahmid Rahman Laskar, Mizanur Rahman, Shadikur Rahman, Mehrad Shahmohammadi, Megh Thakkar, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty
cs.AI

摘要

圖表無處不在,人們常利用它們來分析數據、解答問題並挖掘關鍵見解。然而,執行複雜的圖表分析任務需要大量的感知與認知努力。圖表問答系統(CQA)通過使模型能夠解讀並推理數據的視覺表示,自動化這一過程。但現有的基準測試如ChartQA缺乏現實世界的多樣性,且近期顯示出現代大型視覺語言模型(LVLMs)的性能飽和。為解決這些限制,我們推出了ChartQAPro,這是一個新基準,包含來自157個不同來源的1,341張圖表,涵蓋多種圖表類型,包括信息圖和儀表板,並設有1,948道多樣化的問題,如選擇題、對話式問題、假設性問題及無解問題,以更好地反映現實世界的挑戰。我們對21個模型的評估顯示,LVLMs在ChartQAPro上的表現大幅下降;例如,Claude Sonnet 3.5在ChartQA上得分90.5%,但在ChartQAPro上僅得55.81%,凸顯了圖表推理的複雜性。我們通過詳細的錯誤分析和消融研究補充了這些發現,識別出提升LVLMs在圖表理解與推理方面能力的關鍵挑戰與機遇。ChartQAPro已發佈於https://github.com/vis-nlp/ChartQAPro。
English
Charts are ubiquitous, as people often use them to analyze data, answer questions, and discover critical insights. However, performing complex analytical tasks with charts requires significant perceptual and cognitive effort. Chart Question Answering (CQA) systems automate this process by enabling models to interpret and reason with visual representations of data. However, existing benchmarks like ChartQA lack real-world diversity and have recently shown performance saturation with modern large vision-language models (LVLMs). To address these limitations, we introduce ChartQAPro, a new benchmark that includes 1,341 charts from 157 diverse sources, spanning various chart types, including infographics and dashboards, and featuring 1,948 questions in various types, such as multiple-choice, conversational, hypothetical, and unanswerable questions, to better reflect real-world challenges. Our evaluations with 21 models show a substantial performance drop for LVLMs on ChartQAPro; e.g., Claude Sonnet 3.5 scores 90.5% on ChartQA but only 55.81% on ChartQAPro, underscoring the complexity of chart reasoning. We complement our findings with detailed error analyses and ablation studies, identifying key challenges and opportunities for advancing LVLMs in chart understanding and reasoning. We release ChartQAPro at https://github.com/vis-nlp/ChartQAPro.

Summary

AI-Generated Summary

PDF172April 18, 2025