Visualizations play a pivotal role in daily communication in an increasingly data-driven world. Research on multimodal large language models (MLLMs) for automated chart understanding has accelerated massively, with steady improvements on standard benchmarks. However, for MLLMs to be reliable, they must be robust to misleading visualizations, i.e., charts that distort the underlying data, leading readers to draw inaccurate conclusions. Here, we uncover an important vulnerability: MLLM question-answering (QA) accuracy on misleading visualizations drops on average to the level of the random baseline. To address this, we provide the first comparison of six inference-time methods to improve QA performance on misleading visualizations, without compromising accuracy on non-misleading ones. We find that two methods, table-based QA and redrawing the visualization, are effective, with improvements of up to 19.6 percentage points. We make our code and data available.
翻译:在日益数据驱动的世界中,可视化在日常沟通中扮演着关键角色。用于自动图表理解的多模态大语言模型(MLLMs)研究已大规模加速,在标准基准测试中持续取得进步。然而,要使MLLMs具备可靠性,它们必须对误导性可视化——即扭曲底层数据、导致读者得出错误结论的图表——具有鲁棒性。在此,我们揭示了一个重要漏洞:MLLM在误导性可视化上的问答(QA)准确率平均下降至随机基线水平。为解决此问题,我们首次比较了六种推理时方法,旨在提升模型在误导性可视化上的QA性能,同时不损害其在非误导性图表上的准确性。我们发现,基于表格的问答和可视化重绘这两种方法效果显著,最高可提升19.6个百分点。我们将公开代码与数据。