In this paper, we investigate the problem of deciding whether two standard normal random vectors $\mathsf{X}\in\mathbb{R}^{n}$ and $\mathsf{Y}\in\mathbb{R}^{n}$ are correlated or not. This is formulated as a hypothesis testing problem, where under the null hypothesis, these vectors are statistically independent, while under the alternative, $\mathsf{X}$ and a randomly and uniformly permuted version of $\mathsf{Y}$, are correlated with correlation $\rho$. We analyze the thresholds at which optimal testing is information-theoretically impossible and possible, as a function of $n$ and $\rho$. To derive our information-theoretic lower bounds, we develop a novel technique for evaluating the second moment of the likelihood ratio using an orthogonal polynomials expansion, which among other things, reveals a surprising connection to integer partition functions. We also study a multi-dimensional generalization of the above setting, where rather than two vectors we observe two databases/matrices, and furthermore allow for partial correlations between these two.
翻译:本文研究了如何判定两个标准正态随机向量$\mathsf{X}\in\mathbb{R}^{n}$和$\mathsf{Y}\in\mathbb{R}^{n}$是否相关的问题。该问题被构建为假设检验问题:在原假设下,这两个向量统计独立;而在备择假设下,$\mathsf{X}$与经过随机均匀置换的$\mathsf{Y}$之间存在相关系数为$\rho$的相关性。我们分析了作为$n$和$\rho$的函数,最优检验在信息论意义上不可能与可能达到的阈值。为推导信息论下界,我们提出了一种新方法,利用正交多项式展开计算似然比的二阶矩,从而揭示了该方法与整数划分函数之间的令人惊讶的联系。此外,我们还研究了上述设定的一种多维推广,其中观测对象不是两个向量,而是两个数据库/矩阵,并且允许两者之间存在部分相关性。