Conditional independence tests (CITs) test for conditional dependence between random variables. As existing CITs are limited in their applicability to complex, high-dimensional variables such as images, we introduce deep nonparametric CITs (DNCITs). The DNCITs combine embedding maps, which extract feature representations of high-dimensional variables, with nonparametric CITs applicable to these feature representations. For the embedding maps, we derive general properties on their parameter estimators to obtain valid DNCITs and show that these properties include embedding maps learned through (conditional) unsupervised or transfer learning. For the nonparametric CITs, appropriate tests are selected and adapted to be applicable to feature representations. Through simulations, we investigate the performance of the DNCITs for different embedding maps and nonparametric CITs under varying confounder dimensions and confounder relationships. We apply the DNCITs to brain MRI scans and behavioral traits, given confounders, of healthy individuals from the UK Biobank (UKB), confirming null results from a number of ambiguous personality neuroscience studies with a larger data set and with our more powerful tests. In addition, in a confounder control study, we apply the DNCITs to brain MRI scans and a confounder set to test for sufficient confounder control, leading to a potential reduction in the confounder dimension under improved confounder control compared to existing state-of-the-art confounder control studies for the UKB. Finally, we provide an R package implementing the DNCITs.
翻译:条件独立性检验用于检测随机变量间的条件依赖关系。针对现有方法在处理复杂高维变量(如图像)时的局限性,本文提出了深度非参数条件独立性检验方法。该方法通过嵌入映射提取高维变量的特征表示,并结合适用于特征表示的非参数条件独立性检验。对于嵌入映射,我们推导了其参数估计量需满足的通用性质以确保检验的有效性,并证明通过(条件)无监督学习或迁移学习获得的嵌入映射均符合这些性质。对于非参数检验,我们筛选并调整了适用于特征表示的检验方法。通过模拟实验,我们研究了在不同混杂因子维度与关系下,不同嵌入映射与非参数检验组合的性能表现。我们将该方法应用于英国生物银行中健康个体的脑部磁共振成像与行为特征数据(在给定混杂因子的条件下),利用更大规模数据集和更高检验效能的方法,验证了多项存在争议的人格神经科学研究中的零结果结论。此外,在混杂因子控制研究中,我们将该方法应用于脑部磁共振成像与混杂因子集,以检验混杂因子控制的充分性。与英国生物银行现有最先进的混杂因子控制研究相比,该方法在提升控制效果的同时可能降低所需混杂因子的维度。最后,我们提供了实现该方法的R软件包。