We provide a pipeline for calculating, managing and visualising correlations and other pairwise scores for numerical and categorical data. We present a uniform interface for calculating a plethora of pairwise scores and a new tidy data structure for managing the results. We also provide new visualisations which simultaneously show multiple and/or grouped pairwise scores. The visualisations are far richer than a traditional heatmap of correlation scores, as they help identify relationships with categorical variables, numeric variable pairs with non-linear associations or those which exhibit Simpson's paradox. These methods are available in our R package bullseye.
翻译:本文提出了一套用于计算、管理和可视化数值与分类数据相关性及其他成对得分的流程体系。我们设计了一个统一接口用于计算多种成对得分,并提出了一种新型整洁数据结构以管理计算结果。同时,我们开发了能够同步展示多重及/或分组成对得分的新型可视化方案。这些可视化方法比传统的相关性得分热力图更具信息维度,既能识别分类变量的关联关系,也能检测具有非线性关联的数值变量对或呈现辛普森悖论的数据特征。相关方法已集成于我们开发的R软件包bullseye中。