Kernel techniques are among the most influential approaches in data science and statistics. Under mild conditions, the reproducing kernel Hilbert space associated to a kernel is capable of encoding the independence of $M\ge 2$ random variables. Probably the most widespread independence measure relying on kernels is the so-called Hilbert-Schmidt independence criterion (HSIC; also referred to as distance covariance in the statistics literature). Despite various existing HSIC estimators designed since its introduction close to two decades ago, the fundamental question of the rate at which HSIC can be estimated is still open. In this work, we prove that the minimax optimal rate of HSIC estimation on $\mathbb R^d$ for Borel measures containing the Gaussians with continuous bounded translation-invariant characteristic kernels is $\mathcal O\!\left(n^{-1/2}\right)$. Specifically, our result implies the optimality in the minimax sense of many of the most-frequently used estimators (including the U-statistic, the V-statistic, and the Nystr\"om-based one) on $\mathbb R^d$.
翻译:核技术是数据科学与统计学中最具影响力的方法之一。在温和条件下,与核相关联的再生核希尔伯特空间能够编码$M\ge 2$个随机变量的独立性。最广泛使用的基于核的独立性度量当属希尔伯特-施密特独立性准则(HSIC;在统计学文献中也称为距离协方差)。尽管自约二十年前提出以来已有多种HSIC估计量被设计出来,但HSIC可被估计的速率这一基本问题仍悬而未决。本文证明,在$\mathbb R^d$上包含高斯测度的博雷尔测度中,对于连续有界平移不变特征核,HSIC估计的极小极大最优率为$\mathcal O\!\left(n^{-1/2}\right)$。具体而言,我们的结果意味着在$\mathbb R^d$上许多最常用的估计量(包括U统计量、V统计量和基于Nyström的估计量)在极小极大意义上具有最优性。