We present results from a preregistered and crowdsourced user study where we asked members of the general population to determine whether two samples represented using different forms of data visualizations are drawn from the same or different populations. Such a task reduces to assessing whether the overlap between the two visualized samples is large enough to suggest similar or different origins. When using idealized normal curves fitted on the samples, it is essentially a graphical formulation of the classic Student's t-test. However, we speculate that using more sophisticated visual representations, such as bar histograms, Wilkinson dot plots, strip plots, or Tukey boxplots will both allow people to be more accurate at this task as well as better understand its meaning. In other words, the purpose of our study is to explore which visualization best scaffolds novices in making graphical inferences about data. However, our results indicate that the more abstracted idealized bell curve representation of the task yields more accuracy.
翻译:我们呈现了一项预先注册的众包用户研究结果,其中我们要求普通人群成员判断以不同数据可视化形式表示的两个样本是否来自同一总体或不同总体。此类任务简化为评估两个可视化样本之间的重叠是否足够大,从而暗示其来源相同或不同。当使用基于样本拟合的理想化正态曲线时,这本质上就是经典学生t检验的图形化表述。然而,我们推测使用更复杂的视觉表示,如条形直方图、威尔金森点图、带状图或图基箱线图,既能帮助人们更准确地完成这项任务,也能更好地理解其含义。换句话说,本研究旨在探索哪种可视化方式最有助于新手对数据进行图形化推断。但我们的结果表明,针对该任务的更抽象的理想化钟形曲线表示能带来更高的准确性。