We investigate the problem of detecting dependencies between the components of a high-dimensional vector. Our approach advances the existing literature in two important respects. First, we consider the problem under privacy constraints. Second, instead of testing whether the coordinates are pairwise independent, we are interested in determining whether certain pairwise associations between the components (such as all pairwise Kendall's $τ$ coefficients) do not exceed a given threshold in absolute value. Considering hypotheses of this form is motivated by the observation that in the high-dimensional regime, it is rare and perhaps impossible to have a null hypothesis that can be modeled exactly by assuming that all pairwise associations are precisely equal to zero. The formulation of the null hypothesis as a composite hypothesis makes the problem of constructing tests already non-standard in the non-private setting. Additionally, under privacy constraints, state of the art procedures rely on permutation approaches that are rendered invalid under a composite null. We propose a novel bootstrap based methodology that is especially powerful in sparse settings, develop theoretical guarantees under mild assumptions and show that the proposed method enjoys good finite sample properties even in the high privacy regime. Additionally, we present applications in medical data that showcase the applicability of our methodology.
翻译:我们研究了检测高维向量分量之间依赖关系的问题。我们的方法在两个方面推进了现有文献。首先,我们考虑了隐私约束下的该问题。其次,我们并非检验坐标之间是否两两独立,而是关注于判定分量之间的某些成对关联(例如所有成对Kendall的$τ$系数)的绝对值是否不超过给定阈值。这种假设形式的提出,源于以下观察:在高维情形下,几乎不可能存在一个精确假设所有成对关联均为零的原假设。将原假设设定为复合假设,已使得在非隐私环境下构建检验变得非标准化。此外,在隐私约束下,最先进的程序依赖于排列方法,而这些方法在复合原假设下无效。我们提出了一种新颖的基于自助法的方法,该方法在稀疏设定下尤为强大,在温和假设条件下给出了理论保证,并证明了所提方法即使在高度隐私设置下也具有良好的有限样本性质。此外,我们展示了在医疗数据中的应用,以证明该方法的适用性。