Distance correlation is a popular measure of dependence between random variables. It has some robustness properties, but not all. We prove that the influence function of the usual distance correlation is bounded, but that its breakdown value is zero. Moreover, it has an unbounded sensitivity function, converging to the bounded influence function for increasing sample size. To address this sensitivity to outliers we construct a more robust version of distance correlation, which is based on a new data transformation. Simulations indicate that the resulting method is quite robust, and has good power in the presence of outliers. We illustrate the method on genetic data. Comparing the classical distance correlation with its more robust version provides additional insight.
翻译:距离相关是一种度量随机变量间依赖关系的常用指标。它具备一定的鲁棒性,但并非完全稳健。我们证明了标准距离相关的影响函数有界,但其崩溃值为零。此外,其敏感函数无界,随着样本量增加会收敛至有界影响函数。为解决对异常值的敏感性,我们基于一种新的数据变换构造了更鲁棒的距离相关版本。模拟实验表明,该方法具有较好的鲁棒性,且在存在异常值时仍具统计效能。我们通过遗传数据验证了该方法,比较经典距离相关与其鲁棒版本的差异提供了额外洞见。