This study investigates the extension of distance variance, a validated spread metric for continuous and binary variables [Edelmann et al., 2020, Ann. Stat., 48(6)], to quantify the spread of general categorical variables. We provide both geometric and algebraic characterizations of distance variance, revealing its connections to some commonly used entropy measures, and the variance-covariance matrix of the one-hot encoded representation. However, we demonstrate that distance variance fails to satisfy the Schur-concavity axiom for categorical variables with more than two categories, leading to counterintuitive results. This limitation hinders its applicability as a universal measure of spread.
翻译:本研究探讨了距离方差的推广——这是一种对连续变量和二元变量已验证的离散度量指标[Edelmann等人,2020,Ann. Stat.,48(6)]——用以量化一般分类变量的离散程度。我们从几何和代数两个角度对距离方差进行了刻画,揭示了其与若干常用熵测度以及独热编码表示的方差-协方差矩阵之间的联系。然而,我们证明了当分类变量具有两个以上类别时,距离方差不满足Schur-凹性公理,从而导致违反直觉的结果。这一局限性阻碍了其作为通用离散度量的适用性。