Pearson's Chi-squared test, though widely used for detecting association between categorical variables, exhibits low statistical power in large sparse contingency tables. To address this limitation, two novel permutation tests have been recently developed: the distance covariance permutation test and the U-statistic permutation test. Both leverage the distance covariance functional but employ different estimators. In this work, we explore key statistical properties of the distance covariance for categorical variables. Firstly, we show that unlike Chi-squared, the distance covariance functional is B-robust for any number of categories (fixed or diverging). Second, we establish the strong consistency of distance covariance screening under mild conditions, and simulations confirm its advantage over Chi-squared screening, especially for large sparse tables. Finally, we derive an approximate null distribution for a bias-corrected distance correlation estimate, demonstrating its effectiveness through simulations.
翻译:皮尔逊卡方检验虽广泛用于检测分类变量间的关联性,但在大稀疏列联表中统计功效较低。为克服此局限,近期发展了两种新型置换检验:距离协方差置换检验与U统计量置换检验。两者均基于距离协方差泛函,但采用不同估计量。本研究探讨分类变量距离协方差的关键统计性质:首先证明与卡方检验不同,距离协方差泛函在任意类别数(固定或发散)下均具有B-稳健性;其次,在温和条件下建立距离协方差筛选的强相合性,模拟实验证实其优于卡方筛选,尤其适用于大稀疏表格;最后推导出偏差校正距离相关系数估计的近似零分布,并通过模拟验证其有效性。