Distance correlation is a novel class of multivariate dependence measure, taking positive values between 0 and 1, and applicable to random vectors of arbitrary dimensions, not necessarily equal. It offers several advantages over the well-known Pearson correlation coefficient, the most important is that distance correlation equals zero if and only if the random vectors are independent. There are two different estimators of the distance correlation available in the literature. The first one, proposed by Sz\'ekely et al. (2007), is based on an asymptotically unbiased estimator of the distance covariance which turns out to be a V-statistic. The second one builds on an unbiased estimator of the distance covariance proposed in Sz\'ekely et al. (2014), proved to be an U-statistic by Sz\'ekely and Huo (2016). This study evaluates their efficiency (mean squared error) and compares computational times for both methods under different dependence structures. Under conditions of independence or near-independence, the V-estimates are biased, while the U-estimator frequently cannot be computed due to negative values. To address this challenge, a convex linear combination of the former estimators is proposed and studied, yielding good results regardless of the level of dependence.
翻译:距离相关性是一种新型的多变量依赖性度量方法,其取值介于0到1之间,可应用于任意维度的随机向量(无需维度相等)。相较于著名的皮尔逊相关系数,它具有若干优势,其中最重要的是:距离相关性等于0当且仅当随机向量相互独立。文献中现有两种不同的距离相关性估计方法:第一种由Székely等人(2007)提出,基于距离协方差的渐近无偏估计量(实为V-统计量);第二种基于Székely等人(2014)提出的距离协方差无偏估计量,该估计量由Székely和Huo(2016)证实为U-统计量。本研究评估了两种方法在不同依赖结构下的效率(均方误差)并比较了计算时间。在独立性或接近独立性的条件下,V-估计存在偏差,而U-估计常因负值无法计算。针对这一挑战,本文提出并研究了两类估计量的凸线性组合,该方法在不同依赖程度下均能取得良好效果。