We introduce kernel integrated $R^2$, a new measure of statistical dependence that combines the local normalization principle of the recently introduced integrated $R^2$ with the flexibility of reproducing kernel Hilbert spaces (RKHSs). The proposed measure extends integrated $R^2$ from scalar responses to responses taking values on general spaces equipped with a characteristic kernel, allowing to measure dependence of multivariate, functional, and structured data, while remaining sensitive to tail behaviour and oscillatory dependence structures. We establish that (i) this new measure takes values in $[0,1]$, (ii) equals zero if and only if independence holds, and (iii) equals one if and only if the response is almost surely a measurable function of the covariates. Two estimators are proposed: a graph-based method using $K$-nearest neighbours and an RKHS-based method built on conditional mean embeddings. We prove consistency and derive convergence rates for the graph-based estimator, showing its adaptation to intrinsic dimensionality. Numerical experiments on simulated data and a real data experiment in the context of dependency testing for media annotations demonstrate competitive power against state-of-the-art dependence measures, particularly in settings involving non-linear and structured relationships.
翻译:我们提出了核集成$R^2$,这是一种新的统计依赖性度量,它将最近提出的集成$R^2$的局部归一化原理与再生核希尔伯特空间(RKHSs)的灵活性相结合。该度量将集成$R^2$从标量响应扩展到取值于配备特征核的一般空间的响应,从而能够度量多元、函数和结构化数据的依赖性,同时保持对尾部行为和振荡依赖性结构的敏感性。我们证明:(i)这一新度量取值于$[0,1]$区间;(ii)当且仅当独立性成立时其值为零;(iii)当且仅当响应几乎肯定是协变量的可测函数时其值为一。我们提出了两种估计器:一种基于$K$近邻的图方法,以及一种基于条件均值嵌入的RKHS方法。我们证明了图基估计器的一致性并推导了其收敛速率,表明其能够适应内在维度。在模拟数据上的数值实验以及一个关于媒体标注依赖性测试的真实数据实验表明,该度量相对于最先进的依赖性度量方法具有竞争力,特别是在涉及非线性和结构化关系的场景中。