Kernel techniques are among the most influential approaches in data science and statistics. Under mild conditions, the reproducing kernel Hilbert space associated to a kernel is capable of encoding the independence of $M\ge 2$ random variables. Probably the most widespread independence measure relying on kernels is the so-called Hilbert-Schmidt independence criterion (HSIC; also referred to as distance covariance in the statistics literature). Despite various existing HSIC estimators designed since its introduction close to two decades ago, the fundamental question of the rate at which HSIC can be estimated is still open. In this work, we prove that the minimax optimal rate of HSIC estimation on $\mathbb R^d$ for Borel measures containing the Gaussians with continuous bounded translation-invariant characteristic kernels is $\mathcal O\!\left(n^{-1/2}\right)$. Specifically, our result implies the optimality in the minimax sense of many of the most-frequently used estimators (including the U-statistic, the V-statistic, and the Nystr\"om-based one) on $\mathbb R^d$.
翻译:核方法是数据科学与统计学中最具影响力的方法之一。在温和条件下,与核相关联的再生核希尔伯特空间能够编码$M\ge 2$个随机变量的独立性。最广泛使用的基于核的独立性度量是所谓的希尔伯特-施密特独立性准则(HSIC;在统计学文献中也称为距离协方差)。尽管自近二十年前提出以来已设计了多种HSIC估计量,但关于HSIC可被估计的基本速率问题仍未解决。在本工作中,我们证明了在$\mathbb R^d$上,对于包含高斯分布的博雷尔测度族,采用连续有界平移不变特征核时,HSIC估计的极小极大最优率为$\mathcal O\!\left(n^{-1/2}\right)$。具体而言,我们的结果意味着在$\mathbb R^d$上,许多最常用的估计量(包括U统计量、V统计量和基于Nystr\"om的估计量)在极小极大意义下是最优的。