In this work, we propose an efficient two-stage algorithm solving a joint problem of correlation detection and partial alignment recovery between two Gaussian databases. Correlation detection is a hypothesis testing problem; under the null hypothesis, the databases are independent, and under the alternate hypothesis, they are correlated, under an unknown row permutation. We develop bounds on the type-I and type-II error probabilities, and show that the analyzed detector performs better than a recently proposed detector, at least for some specific parameter choices. Since the proposed detector relies on a statistic, which is a sum of dependent indicator random variables, then in order to bound the type-I probability of error, we develop a novel graph-theoretic technique for bounding the $k$-th order moments of such statistics. When the databases are accepted as correlated, the algorithm also recovers some partial alignment between the given databases. We also propose two more algorithms: (i) One more algorithm for partial alignment recovery, whose reliability and computational complexity are both higher than those of the first proposed algorithm. (ii) An algorithm for full alignment recovery, which has a reduced amount of calculations and a not much lower error probability, when compared to the optimal recovery procedure.
翻译:本文提出一种高效的两阶段算法,用于解决两个高斯数据库之间的相关性检测与部分对齐恢复联合问题。相关性检测属于假设检验问题:在原假设下,数据库相互独立;在备择假设下,数据库在未知行置换条件下存在相关性。我们推导了第一类与第二类错误概率的界,并证明所提出的检测器至少在特定参数选择下优于近期提出的检测器。由于该检测器依赖的统计量是具有相关性的示性随机变量之和,为界定第一类错误概率,我们开发了一种新颖的图论技术来界定此类统计量的$k$阶矩。当数据库被判定为相关时,该算法还能恢复数据库间的部分对齐关系。此外,我们另提出两种算法:(i)一种部分对齐恢复算法,其可靠性和计算复杂度均高于第一种算法;(ii)一种全对齐恢复算法,与最优恢复过程相比,该算法计算量更小且错误概率下降幅度不大。