We present a state-of-the-art report on visualization corpora in automated chart analysis research. We survey 56 papers that created or used a visualization corpus as the input of their research techniques or systems. Based on a multi-level task taxonomy that identifies the goal, method, and outputs of automated chart analysis, we examine the property space of existing chart corpora along five dimensions: format, scope, collection method, annotations, and diversity. Through the survey, we summarize common patterns and practices of creating chart corpora, identify research gaps and opportunities, and discuss the desired properties of future benchmark corpora and the required tools to create them.
翻译:我们呈现了自动图表分析研究中可视化语料库的技术现状报告。我们调查了56篇以可视化语料库作为研究技术或系统输入的论文。基于识别自动图表分析目标、方法和输出的多层次任务分类体系,我们从格式、范围、收集方法、标注和多样性五个维度考察了现有图表语料库的属性空间。通过这项调查,我们总结了构建图表语料库的常见模式和实践,识别了研究空白与机遇,并探讨了未来基准语料库的理想属性以及构建这些语料库所需的工具。