We discuss the inadequacy of covariances/correlations and other measures in L2 as relative distance metrics under some conditions. We propose a computationally simple heuristic to transform a map based on standard principal component analysis (PCA) (when the variables are asymptotically Gaussian) into an entropy-based map where distances are based on mutual information (MI). Rescaling Principal Component based distances using MI allows a representation of relative statistical associations when, as in genetics, it is applied on bit measurements between individuals' genomic mutual information. This entropy rescaled PCA, while preserving order relationships (along a dimension), changes the relative distances to make them linear to information. We show the effect on the entire world population and some subsamples, which leads to significant differences with the results of current research.
翻译:我们讨论了在特定条件下,L2空间中的协方差/相关性及其他度量作为相对距离指标的不适用性。提出了一种计算简便的启发式方法,将基于标准主成分分析(PCA)(当变量渐近服从高斯分布时)的图谱转换为基于熵的图谱,其中距离以互信息(MI)为度量。通过MI重标定基于主成分的距离,能够呈现相对统计关联——如遗传学中应用于个体基因组互信息的比特度量。这种熵重标定PCA在保持维度内顺序关系的同时,改变了相对距离使其与信息量呈线性关系。我们展示了其对全球人口及部分子样本的影响,这与当前研究结果存在显著差异。