Data visualizations are common in the real-world. We often use them in data sources such as scientific documents, news articles, textbooks, and social media to summarize key information in a visual form. Charts can also mislead its audience by communicating false information or biasing them towards a specific agenda. Verifying claims against charts is not a straightforward process. It requires analyzing both the text and visual components of the chart, considering characteristics such as colors, positions, and orientations. Moreover, to determine if a claim is supported by the chart content often requires different types of reasoning. To address this challenge, we introduce ChartCheck, a novel dataset for fact-checking against chart images. ChartCheck is the first large-scale dataset with 1.7k real-world charts and 10.5k human-written claims and explanations. We evaluated the dataset on state-of-the-art models and achieved an accuracy of 73.9 in the finetuned setting. Additionally, we identified chart characteristics and reasoning types that challenge the models.
翻译:数据可视化在现实世界中十分常见。我们常在科学文献、新闻文章、教科书和社交媒体等数据源中使用它们,以视觉形式总结关键信息。图表也可能通过传达虚假信息或引导受众偏向特定议程而误导读者。验证文本主张与图表内容是否一致并非简单过程,它需要同时分析图表的文本和视觉成分,并考虑颜色、位置和方向等特征。此外,判断一项主张是否得到图表内容支持往往需要多种类型的推理。为应对这一挑战,我们引入了ChartCheck,一个用于针对图表图像进行事实核查的新颖数据集。ChartCheck是首个大规模数据集,包含1.7k张真实世界图表以及10.5k条人工撰写的文本主张与解释。我们在最先进的模型上对该数据集进行了评估,在微调设置下达到了73.9%的准确率。此外,我们还识别了给模型带来挑战的图表特征和推理类型。