Modeling interactions among multimodal, high-dimensional data is intrinsically challenging due to ultra-high dimensionality and complex dependence structure with high level noise. Screening methods are effective for reducing dimensionality, but most existing approaches shrink only the predictor space while retaining all outcomes. In cross-modal analyses, different outcomes often select different predictor subsets, so the union remains large and the response dimension is unchanged, limiting the practical benefit of screening. This gives rise to heavy computational burdens and poor interpretability. To address these limitations, we propose a new screening framework, Graph Independence Dual Screening (GIDS), which simultaneously reduces the dimensionality of response variables and predictors. We design computationally efficient algorithms that facilitate downstream selection procedures, improving accuracy and scalability, and establish supporting theoretical results. Extensive simulation studies demonstrate that GIDS outperforms existing methods that screen only predictors. To illustrate its utility, we applied GIDS to the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, analyzing interactions between genome-wide 865,353 DNA methylation and 49,386 transcriptomic variables. GIDS reduced the feature space to approximately 9,000 CpGs and 2,000 transcripts, uncovering blockwise interaction structures: clusters of CpG sites and gene transcripts with strong associations. These findings not only improve computational tractability but also yield interpretable biological insights, highlighting coordinated regulatory mechanisms underlying Alzheimer's disease.
翻译:由于超高维度、复杂依赖结构以及高水平噪声,对多模态高维数据中的交互作用进行建模本质上是困难的。筛选方法能有效降维,但现有方法通常仅缩减预测变量空间,而保留所有结果变量。在跨模态分析中,不同的结果变量往往选择不同的预测变量子集,因此其并集仍然很大,且响应变量的维度保持不变,限制了筛选的实际效用。这导致了沉重的计算负担和较差的解释性。为解决这些局限,我们提出了一种新的筛选框架——图独立双筛选法,它能同时缩减响应变量和预测变量的维度。我们设计了计算高效的算法,以便于后续的变量选择过程,提高了准确性和可扩展性,并建立了支撑性的理论结果。广泛的模拟研究表明,GIDS优于仅筛选预测变量的现有方法。为阐明其实用性,我们将GIDS应用于阿尔茨海默病神经影像学倡议数据集,分析了全基因组865,353个DNA甲基化位点与49,386个转录组变量之间的交互作用。GIDS将特征空间缩减至约9,000个CpG位点和2,000个转录本,揭示了块状交互结构:即具有强关联的CpG位点簇与基因转录本簇。这些发现不仅提高了计算的可处理性,也产生了可解释的生物学见解,突出了阿尔茨海默病背后协调的调控机制。