In this paper we develop a novel nonparametric framework to test the independence of two random variables $\mathbf{X}$ and $\mathbf{Y}$ with unknown respective marginals $H(dx)$ and $G(dy)$ and joint distribution $F(dx dy)$, based on {\it Receiver Operating Characteristic} (ROC) analysis and bipartite ranking. The rationale behind our approach relies on the fact that, the independence hypothesis $\mathcal{H}\_0$ is necessarily false as soon as the optimal scoring function related to the pair of distributions $(H\otimes G,\; F)$, obtained from a bipartite ranking algorithm, has a ROC curve that deviates from the main diagonal of the unit square.We consider a wide class of rank statistics encompassing many ways of deviating from the diagonal in the ROC space to build tests of independence. Beyond its great flexibility, this new method has theoretical properties that far surpass those of its competitors. Nonasymptotic bounds for the two types of testing errors are established. From an empirical perspective, the novel procedure we promote in this paper exhibits a remarkable ability to detect small departures, of various types, from the null assumption $\mathcal{H}_0$, even in high dimension, as supported by the numerical experiments presented here.
翻译:本文提出了一种新颖的非参数框架,用于检验两个具有未知边际分布 $H(dx)$ 和 $G(dy)$ 以及联合分布 $F(dx dy)$ 的随机变量 $\mathbf{X}$ 和 $\mathbf{Y}$ 之间的独立性。该框架基于接收者操作特征(ROC)分析和二分排序方法。我们的方法核心逻辑在于:一旦与分布对 $(H\otimes G,\; F)$ 相关的、由二分排序算法求得的最优评分函数的ROC曲线偏离单位正方形的主对角线,则独立性假设 $\mathcal{H}\_0$ 必然不成立。我们考虑了一类广泛的秩统计量,涵盖了ROC空间中偏离对角线的多种方式,以构建独立性检验。除了高度的灵活性外,该新方法在理论性质上远超其现有竞争者。我们建立了两种检验误差的非渐近界。从实证角度看,本文推广的新程序在检测与零假设 $\mathcal{H}_0$ 的各类微小偏离方面表现出显著能力,即使在 高维 场景下亦是如此,所呈现的数值实验也支持这一结论。