ChartQAPro:面向图表问答的更多样化且更具挑战性的基准测试
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering
April 7, 2025
作者: Ahmed Masry, Mohammed Saidul Islam, Mahir Ahmed, Aayush Bajaj, Firoz Kabir, Aaryaman Kartha, Md Tahmid Rahman Laskar, Mizanur Rahman, Shadikur Rahman, Mehrad Shahmohammadi, Megh Thakkar, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty
cs.AI
摘要
图表无处不在,人们常用其分析数据、解答问题并挖掘关键洞见。然而,利用图表执行复杂的分析任务需要大量的感知与认知努力。图表问答系统(CQA)通过使模型能够解读并推理数据的视觉表示,自动化了这一过程。然而,现有的基准测试如ChartQA缺乏现实世界的多样性,且近期在现代大型视觉语言模型(LVLMs)上表现出性能饱和。为应对这些局限,我们推出了ChartQAPro,这一新基准包含来自157个不同来源的1,341张图表,涵盖多种图表类型,包括信息图和仪表板,并设计了1,948道问题,题型多样,如选择题、对话式问题、假设性问题及不可答问题,以更好地反映现实世界的挑战。我们对21个模型的评估显示,LVLMs在ChartQAPro上的性能显著下降;例如,Claude Sonnet 3.5在ChartQA上得分为90.5%,而在ChartQAPro上仅为55.81%,凸显了图表推理的复杂性。我们通过详细的错误分析和消融研究补充了发现,识别出提升LVLMs在图表理解与推理方面能力的关键挑战与机遇。ChartQAPro已发布于https://github.com/vis-nlp/ChartQAPro。
English
Charts are ubiquitous, as people often use them to analyze data, answer
questions, and discover critical insights. However, performing complex
analytical tasks with charts requires significant perceptual and cognitive
effort. Chart Question Answering (CQA) systems automate this process by
enabling models to interpret and reason with visual representations of data.
However, existing benchmarks like ChartQA lack real-world diversity and have
recently shown performance saturation with modern large vision-language models
(LVLMs). To address these limitations, we introduce ChartQAPro, a new benchmark
that includes 1,341 charts from 157 diverse sources, spanning various chart
types, including infographics and dashboards, and featuring 1,948 questions in
various types, such as multiple-choice, conversational, hypothetical, and
unanswerable questions, to better reflect real-world challenges. Our
evaluations with 21 models show a substantial performance drop for LVLMs on
ChartQAPro; e.g., Claude Sonnet 3.5 scores 90.5% on ChartQA but only 55.81% on
ChartQAPro, underscoring the complexity of chart reasoning. We complement our
findings with detailed error analyses and ablation studies, identifying key
challenges and opportunities for advancing LVLMs in chart understanding and
reasoning. We release ChartQAPro at https://github.com/vis-nlp/ChartQAPro.Summary
AI-Generated Summary